Knowledge and belief in the times of COVID-19: A comparative analysis of epistemicity in English newspaper discourse of two stages of the pandemic

This paper sets forth a quantitative analysis of expressions of epistemicity, a category covering the expression of commitment to the information transmitted and comprising epistemic modality and evidentiality, in a corpus of 400 newspaper articles from The Guardian concerning the COVID-19 pandemic. 200 articles were written in April 2020; the other 200 were written between January and April 2022, after massive vaccination and an extraordinary increase in medical knowledge. The analysis distinguishes between a number of subtypes of epistemic expressions and three kinds of authorial voice. The results show that the April 2020 articles contain more epistemic expressions, of both weak commitment (might, perhaps, apparently…) and strong commitment (know, clearly, surely…), which suggests a greater need to distinguish the known from the unknown in this period, due to the pervasive state of uncertainty. The analysis has social implications, since it gives readers an opportunity to appreciate the careful assessments of epistemicity found in the corpus and therefore to consider the convenience of obtaining information from quality media. These social implications, together with the methodology of the analysis, contribute to the potential of the paper for pedagogical applications.


Introduction
The first two years of the COVID-19 pandemic witnessed an unprecedented growth of medical knowledge about the virus SARS-CoV-2 and the illness it provoked. In the first months of the pandemic (declared by the WHO on 11 March 2020), everyone, and especially medical doctors and health personnel, were met with a brand new contagious virus that was deadly or caused serious injuries to many people, especially but not exclusively the elderly and those with a previous poor health condition. The governments of most European countries decreed lockdowns between March and May-June 2020, and reduction of people's mobility soon proved to be an efficient means to slow the spread of the virus. In the meantime, and in the following months, scientific knowledge steadily increased: some well-known hits were the confirmation of masks as a useful device to reduce the spread of the illness, the advantages of prone positioning for patients in intensive care units, and obviously the appearance of vaccines and their extended use in the population since mid-2021.
The role of mass media during the pandemic has been (and still is) crucial, since they are in charge of communicating the news of all kinds, for instance those concerning progress and scientific knowledge, government policies and financial consequences, amongst others. Not surprisingly, research on newspaper discourse in the pandemic soon started to appear: some examples are Andreouli and Brice (2021) , Tejedor et al. (2020) , Orts and Vargas-Sierra (2022) , E-mail address: mcarrete@ucm.es Rovino et al. (2021) or the collection of papers compiled by Arriaga et al. (2021) . This study adds to existing knowledge on newspaper discourse about the pandemic, by setting forth a quantitative analysis of expressions of epistemicity (i.e. qualification of commitment to the information transmitted) in newspaper discourse. The analysed corpus consists of newspaper articles from The Guardian and its sister newspaper The Observer 1 whose main topic is COVID-19. The corpus, compiled by the author of this paper, was divided into two subcorpora: the first contains articles published in April 2020, a time of lockdown in the UK and many other countries, when uncertainty about the illness itself and its collateral effects was pervasive; the second subcorpus contains articles published in January-April 2022, when scientific knowledge and its application had already improved the health situation and also its collateral effects. The comparative analysis of epistemicity of the two subcorpora aims at (dis)confirming the following hypotheses: Hypothesis 1. The increase in knowledge about SARS-CoV-2 and COVID-19 through stages of the pandemic will be reflected in a lesser use of epistemic expressions. For this reason, the earlier articles are predicted to contain more epistemic expressions than the later articles.
Hypothesis 2. This quantitative difference is predicted to be remarkably larger for expressions of weak epistemic commitment, such as modal auxiliaries ( may, might, could ) or adverbs ( perhaps, maybe, probably, apparently ), compared to expressions of knowledge or strong commitment, such as the verb know 2 and the adverbs certainly, clearly and obviously .
The choice of The Guardian , a newspaper of centre-left leaning, was due to its prestige and to the result of a study carried out at the beginning of the pandemic by the University of Oxford, whose conclusion was that readers considered it to have offered the best coronavirus coverage of all the media. 3 This study has social implications: knowledge of the use of epistemic expressions in this quality newspaper might help people to appreciate the carefulness with which it assesses the information transmitted, distinguishing known facts from probabilities or apparent facts. This appreciation might lead to consideration of the media from which readers obtain the information, so that, before embracing beliefs about the pandemic, it should be quality media that count, to the detriment of less reliable media.
The structure of the paper is as follows. Section 2 describes the corpus analysed. Section 3 deals with the theoretical framework, based on a general category of epistemicity that comprises epistemic modality and evidentiality. Section 4 covers the method used in the selection and labelling of expressions for quantitative analysis. Section 5 presents and discusses the results of this analysis. Section 6 points out a number of social implications and pedagogical applications. Finally, Section 7 sums up the main conclusions and provides suggestions for further research.

The corpus
The corpus under analysis consists of 400 articles from The Guardian whose main topic is the COVID-19 pandemic. The articles include news, opinion articles, and some interviews. The main topic of the selected articles was the pandemic and its evolution from a medical point of view; the articles, then, aimed mainly at informing readers about the illness itself and/or the measures adopted by governments to cope with it. The selected articles were those that, according to the headline and the lead, concentrated on the following issues: -Health situation in UK and other countries; -Scientific advances concerning knowledge, detection and treatment of the illness, and ways to reduce its spread; -Policies of the UK and other countries, and people's reactions towards these policies such as negationism or protests; -Fake news about the illness and ways to avoid belief in their contents.
Other related topics were not included, such as the side effects of the pandemic in people's lives different from health problems directly provoked by the virus, such as mental health issues, staff reduction of companies with the consequent rise of unemployment, the increase in domestic violence or the influence of COVID-19 in literature and art. Also excluded were the articles about the COVID-19 illness of powerful or famous persons, such as Boris Johnson or Queen Elizabeth II, and about events whose possible consequences were political rather than medical, such as the scandal about gatherings held by government members and Conservative party staff ('Partygate') during the pandemic.
The corpus was divided into two subcorpora containing 200 articles each, one with articles published in April 2020 (Subcorpus A) and the other with articles published in January, February, March and April 2022 (Subcorpus B). For the two periods, the number of eligible articles was higher than the number of articles actually selected (see Table 1 ).
The selection amongst eligible articles was done according to two criteria. The first concerns the main topic: priority was given to the articles which highlighted new information on the issues of the pandemic specified above or offered an interpretation of quantitative data about the number of deaths or infected people; by contrast, those articles focusing on the mere transmission of quantitative data were discarded. The second criterion is distributional: whenever possible, each day of the periods under analysis was represented by at least three articles for Subcorpus A and one article for Subcorpus B. The application of this criterion has led to a roughly proportional distribution of eligible and selected articles across periods, as may be seen in Table 1 .
The differences in numbers of eligible and selected articles between Subcorpus A and B and between the periods included in Subcorpus B are well accounted for by the differences between the stages of the pandemic across periods. Subcorpus A corresponds to the first wave of the pandemic and the first national lockdown in the UK and many other countries. 4 As is well-known, this time witnessed a high number of cases of infection, part of which were serious and even fatal, with the consequent overflow of hospitals and health services. This scenario, together with the scant scientific knowledge available on SARS-CoV-2 and COVID-19, provoked a general feeling of uncertainty, anxiety and fear. In this period, the pandemic was the topic par excellence in the media, as shown by the number of eligible articles. The 200 selected articles total 191,716 words.
Subcorpus B contains 200 articles about a later phase of the pandemic, namely January-April 2022. At this time, the health situation was drastically improved, due to the increase of knowledge on the virus and medical measures to cope with it, including mass vaccination available for nearly all the population in the Western world and other countries. The articles belonging to this subcorpus are shorter on average: they total 159,550 words. The distribution of the number of articles along these four months is roughly proportional to the relative weight of the COVID-19 coverage in The Guardian within its overall contents in each of the three months. The evolution of the pandemic and its coverage, and the selection of articles for each month, may be described as follows: -January 2022. The main events of this month were the Omicron variant of the virus, which surged in December 2021 and increased in the number of cases (although most were not severe), and the third dose of the vaccine, prescribed for large segments of the population as a booster shot.  The metadata of the compiled articles for all the subcorpora, namely URL, headline, author, number of words and date, and a code assigned to each article, are specified in the Appendix . All the examples cited from the corpora contain an indication of the code of the article from which the text was extracted.

Epistemicity: concept and scope
In this paper, the term 'epistemicity' is used in the sense proposed by Boye (2012) and adopted in other works such as Carretero et al. (2017Carretero et al. ( , 2022 or Marín-Arrese (2021a, 2021b. Epistemicity consists in the expression of justificatory support (or, in other words, commitment to the validity) for the communicated proposition. Epistemicity comprises two categories, epistemic modality and evidentiality. Epistemic modality expresses the estimation of the chances for a proposition to be or become true (cf. Nuyts 2001 , p. 21), while evidentiality expresses the kind, source and/or evaluation of evidence for or against the truth of a proposition. 5 Epistemic modality is illustrated in (1) by could and might , which assess the truth of the propositions "coronavirus is ripping through some of the poorest and most overcrowded parts of Britain's cities " and "cramped conditions are accelerating the spread of the virus " as possible, not certain. Evidentiality is illustrated in (2) by appears , which indicates that the commitment to the truth of the proposition "Merck's pill is less effective " is not full but agrees with evidence (appearances in this case): (1) Fears are growing that coronavirus could < EPI, AS > 6 be ripping through some of the poorest and most overcrowded parts of Britain's cities as new research suggests cramped living conditions might ⟨EPI, AS ⟩ be accelerating the spread of the virus. (GAPR20-045).
(2) The US has paid $5.3 billion for 10 m courses of Pfizer's new treatment, as well as $2.2bn for treatment from rival Merck, whose pill appears ⟨EVI, W ⟩ to be less effective. (GJAN22-002).
Throughout the paper, the noun 'epistemicity' and the adjective 'epistemic' are used to refer to the large category, while the adjectives corresponding to the subcategories of epistemic modality and evidentiality are 'epistemic-modal' and 'evidential', respectively.

Expressions covered in the analysis
The coverage of epistemic-modal and evidential expressions in this paper includes a number of 'core' expressions of these categories, often included in the literature on epistemicity, 7 whose inclusion in a quan-5 The terms 'true' and 'truth' are used acknowledging that we humans only have access to the perception of reality through our senses, bodies and minds. 6 The meaning of the abbreviations inside the angle brackets are specified in Section 4 . 7 Studies on a wide range of expressions of epistemicity are approached in the special issue of Journal of Pragmatics called Epistemicity and stance in English and other European languages: Discourse-pragmatic perspectives , ed. by Carretero et al. (2022) . Monographs on specific types of these expressions are: on modal auxiliaries, Coates (1983) , Palmer (1990) , Perkins (1983) , and Collins (2009) ; titative analysis provides an in-depth view on the expression of epistemicity in the corpus under study. The expressions included are the following: -Modal auxiliaries with epistemic-modal meaning: may, might, could, must . -Epistemic-modal adverbs: -Expressing medium or low probability: likely, maybe, perhaps, possibly, probably ; -Expressing certainty: certainly, surely . -Evidential adverbs: -Adverbs of appearances: apparently, seemingly ; -Adverbs indicating reportative evidentiality, i.e. evidentiality based on communicative messages, which may be of any kind, from concrete reports to hearsay: allegedly, reportedly . -Adverbs indicating strong evidence leading to high commitment to the information transmitted: clearly, evidently, obviously . -Epistemic-modal adjectives: likely, possible, probable, unlikely . -Evidential adjectives: apparent, clear, unclear, evident, obvious . -Epistemic-modal Complement-taking verbs with a meaning of knowledge or belief, also known as propositional attitude verbs: believe, think, know . -Evidential lexical verbs: appear , look , seem .
These expressions do not exhaust the domain of epistemicity, but they may be considered to be sufficiently representative for characterising the role of epistemicity in the corpus under study. 8 Some of the included expressions have been argued in the literature to allow for both epistemic-modal and evidential readings, depending on whether their main function in context is to estimate the chances for the proposition to be or become true, or else to trigger inferences based on evidence. This position is adopted, for example, in Cornillie (2009) or Alonso-Almeida (2014 . In addition, some expressions are argued to be 'epistential' in the sense of having an epistemic-modal and an evidential component: for instance, Carretero (2020) considered clearly and its Spanish equivalent claramente as epistential on the grounds that their on modal adverbs, Simon-Vandenbergen and Aijmer (2007) and Hoye (1997) ; on complement-taking verbs, Cappelli (2007) and Whitt (2010) . 8 To further limit the scope of this research with a view to obtain a manageable number of occurrences for in-depth discussion, the modal auxiliaries will and would have been excluded: the occurrences of will total 623 in Subcorpus A and 699 in Subcorpus B, and the occurrences of would total 574 in Subcorpus A and 461 in Subcorpus B. Other excluded expressions are: the modal auxiliaries can (which can express epistemic modality only when negated), should and ought to , whose epistemic-modal meaning is infrequent in comparison to the deontic meaning of obligation or recommendability ( Coates 1983 ); epistemicmodal adjectives such as sure or certain ; the evidential verb of appearance see and evidential verbs of mental processes such as hint or suggest; evidential adverbs of lesser frequency such as patently or visibly ( Carretero 2019 ), or adverbials containing epistemic-modal or evidential nouns such as to our knowledge or in all likelihood . The paper also excludes the area of apprehension ( Lichtenberk 1995 ;Caballero and Díaz-Vera 2021 ), which combines degree of certainty with (un)desirability; core examples are the verbs hope , fear and wish . This area, already approached for COVID-19 newspaper discourse by Rovino et al. (2021) , would deserve an independent study based on the present corpus.
meaning has an evidential meaning component of strong evidence and also an epistemic-modal component of high commitment to the truth of the proposition. However, given the high number of occurrences of these expressions in the corpus, they all have been assigned a single value (epistemic-modal or evidential), as stated above.
The analysis only covers the occurrences of the listed expressions with epistemic meanings, leaving out non-epistemic occurrences. Firstly, the non-epistemic meanings of polysemous expressions are discarded: -The modal auxiliaries also have other modal meanings: deontic modality (obligation or permission), as in (3-4) or dynamic modality (potential due to internal or external circumstances, ability or tendency), as in (5): (3) UK must learn from German response to Covid-19, says Whitty (GAPR20-024). (4) At a news conference in New Delhi, Trump says: "You may ask about the coronavirus, which is very well under control in our country […] " (GAPR20-068). (5) "On the day, I didn't even realise she had done it. It helped that I walked into the place not wearing glasses so I couldn't see anything anyway.
-The modal adjective possible and the corresponding adverb possibly can also express deontic modality or also dynamic modality, an example of which is possible in (6): (6) "What I'm lobbying for is for when it comes to those circumstances where it's just not possible to keep your public distance, think of public transport, […] " (GAPR20-086).
-The adjective clear and the adverb clearly have a non-epistemic meaning of clarity (distinctness) exemplified in (7): (7) Current rules in all UK nations now make a clear distinction in selfisolation requirements for vaccinated and unvaccinated people if they come into contact with someone who has tested positive for Covid. (GJAN22-057).
Apart from these cases of polysemy, the lexical verbs think , believe and know were also discarded when they do not have propositional scope, i.e. they do not qualify a proposition that may be true or false. For instance, the verb know in (8) is epistemic-modal: the proposition under its scope is expressed by the stretch from "close " until the end of the example. However, in (9), know is not epistemic because its scope is "a lot of families who are incredibly angry ", which is an expression referring to a group of humans (a type of first-order entity), not a proposition that can be true or false. Also excluded is the discourse marker you know (often pronounced y'know ), whose function is "to mark transitions in information state which are relevant for participation frameworks " ( Schiffrin, 1987 , p. 267).
(8) We know ⟨EPI, 1p, W, pres ⟩ that close or sustained contact is usually required to spread it from one person to another. (GAPR20-004). (9) "I know a lot of families who are incredibly angry. " (GAPR20-109).
The epistemic-modal verbs of knowledge and belief were also excluded when they were inside the scope of an irrealis expression and consequently did not refer to real thoughts, beliefs or (lack of) knowledge entertained by the speaker or writer. Some irrealis contexts are the imperative mood, the collocation want to + infinitive or conditional clauses. For example, think in (10) was excluded because it lies within the scope of a conditional clause and hence does not refer to any belief entertained by the cited speaker (virologist Jonathan Ball) but to hypothetical beliefs that may arise: (10) "[…] Obviously ⟨EVI, DRS ⟩, if you think you have Covid-19 and share a house with a cat, then it would be sensible to limit close interactions with your furry friend until you are better. " (GAPR20-009)

Method of analysis
The analysis carried out on the expressions listed above was manual, for two reasons. Firstly, epistemic and non-epistemic occurrences had to be discriminated for most of the expressions; secondly, authorial voice was also considered, in the way specified below in this section.
The occurrences of the expressions considered as epistemic were annotated on their right with angle brackets containing a number of labels inside them. This marking facilitated searches for the quantitative analysis. As was seen in the examples previously cited, the leftmost label is "EPI " for epistemic-modal expressions and "EVI " for evidential expressions.
Authorial voice was also registered, in order to detect further nuances in the use of epistemic expressions in the two subcorpora. Accordingly, a distinction was made between epistemic qualifications stemming from the writer of the article (labelled as "W ") from those belonging to discourse attributed to other persons and institutions. An occurrence of the first kind is (11), (11) But these are not normal times. There is an urgent need to scale up testing. This might ⟨EPI, W ⟩ require that we adapt protocols to use whichever reagents and equipment are available. (GAPR20-012) where it is the writer of the article who assesses the mentioned adaptation of protocols as possible rather than highly probable or certain. By contrast, the occurrences where the epistemic expressions lie inside speech attributed to different sources from the writers of the articles are divided into direct reported speech ( "DRS "), signalled by quotation marks, and all the other cases of attributed speech, labelled as ( "AS "). This label includes indirect reported speech introduced by reporting verbs such as say , state or suggest , and also reference to more vague sources such as rumours, hopes or fears. Example (12) contains an AS epistemic expression with a vague source ('fears'), and another with a concrete source ('new research'): (12) Fears are growing that coronavirus could ⟨EPI, AS ⟩ be ripping through some of the poorest and most overcrowded parts of Britain's cities as new research suggests cramped living conditions might ⟨EPI, AS ⟩ be accelerating the spread of the virus. (GAPR20-045) In cases of mixed reported speech, the labels "DRS " and "AS " are used depending on whether the epistemic expression is inside or outside the quotation marks, respectively (13-14): (13) Gerald Gartlehner, a leading virologist and previously one of the country's more cautious voices, said this week it was "probably ⟨EPI, DRS ⟩ time to reassess " mandatory jabs, since the new, highly transmissible variant would create unprecedented levels of immunity. (GJAN22-024). (14) Chris Hopson, its chief executive, added that the 100,000 target may ⟨EPI, AS ⟩ have a "galvanising effect " but "what matters most is an updated strategy to take us through the exit from lockdown ". (GAPR20-190).
The annotation system also includes additional specifications for a number of expressions: -The labels for the epistemic-modal lexical verbs contain specification of person: first singular, first plural, second or third. For first person singular and plural, the distinction W/DRS/AS is maintained. For the second and third person, there is no such distinction, since these persons already signal that the epistemic qualification comes from sources other than the author of the article. The annotation also includes tense, making the distinction between present tense, past tense, and other forms (infinitive, ing -participle and ed -participle). This annotation method is illustrated with examples (15-16): (15) Jeff, a retired trucker who arrived in Ottawa 17 days ago, said a shared sense of purpose has only increased the protesters' resolve.
"There's a sense of unity here -and I think ⟨EPI, 1 s, DRS, pres ⟩ it's only increased since yesterday. When you've seen the light, it's hard to [go] back to the dark. " (GFEB22-034). (16) The scientists said dirty air was already known ⟨EPI, 3, past ⟩ to increase the risk of acute respiratory distress syndrome, which is extremely deadly and a cause of Covid-19-related deaths, as well as other respiratory and heart problems. (GAPR20-026).
In the case of the verb know , a distinction is also made between affirmative and negative occurrences, for the reason that this verb indicates full knowledge in affirmative clauses, as in (16) above, but lack of knowledge when negated, as in (17): (17) "We didn't know ⟨EPI, 1p, DRS, past, neg ⟩ we were infecting ourselves, " Ken says. "I am really annoyed when I start thinking about it too much. I am furious with the government, with people making decisions, that the virus was spreading at that time. " (GAPR20-195).
This distinction is not made for the other lexical verbs analysed, the epistemic-modal verbs think and believe and the evidential verbs appear, look and seem ; these verbs express medium commitment to the truth of the proposition, and the difference between the construction with the negated verb is roughly equivalent to the construction with the verb in the affirmative and the clause with the opposite polarity: this phenomenon is sometimes called transferred negation. For example, (18) is roughly paraphraseable with the constructed example (19).
(18) Ferguson said he did not think ⟨EPI, 3, past ⟩ the predictions could be relied on. (GAPR20-023). (19) Ferguson said he thought the predictions could not be relied on.
Finally, two details of the annotation are the distinction between likely as an adjective and as an adverb, and the joint analysis of the combinations might seem and seem(s) likely as occurrences of seem and likely respectively, since the first expression modifies the epistemic qualification encoded by the second expression.

Results and discussion
This section presents and discusses the results of the quantitative analysis of epistemic expressions specified above in the two subcorpora. The section starts with a presentation and discussion of the overall frequency of the expressions and of the three categories of authorial voice where applicable, continues with the subcategories of expressions specified in 3.2., whose individual study uncovers distributional differences, and ends with a summary of the quantitative results.

Overall frequency of expressions and authorial voices
The overall frequency of the different types of expressions, specified in Table 2 , shows that epistemic-modal auxiliaries are by far the most common type of expression of epistemicity in both subcorpora, followed by epistemic-modal lexical verbs. The results also uncover that expressions of epistemicity are more common in Subcorpus A than in Subcorpus B, which agrees with Hypothesis 1.
The quantitative difference across expressions is significant, considering that the p value is smaller than the threshold value for significance usually adopted in linguistics ( p ≤ 0.05). However, the difference is moderate rather than drastic. The reason is, in all probability, that despite the improvement of the health situation, much was still to be known about the pandemic in the time Subcorpus B articles were written. For instance, there was (and still is at the time of writing this paper) no sound knowledge about the amount of virus needed for infection, nor of the antibodies needed to overcome it, nor of the factors accounting for variations in propensity to infection from person to person. An example of remaining doubts is example (20), extracted from an article dated 26 February 2022, which illustrates the uncertainty provoked by the variant Omicron (recent at that time): (20) "The Omicron variant did not come from the Delta variant. It came from a completely different part of the virus's family tree. And since we don't know ⟨EPI, 1p, DRS, pres, neg ⟩ where in the virus's family tree a new variant is going to come from, we cannot know ⟨EPI, 1p, DRS, pres, neg ⟩ how pathogenic it might ⟨EPI, DRS ⟩ be. It could ⟨EPI, DRS ⟩ be less pathogenic but it could ⟨EPI, DRS ⟩, just as easily, be more pathogenic, " he said. (GFEB22-026).
The ratio of all the expressions is higher in Subcorpus A for all the types except for epistemic-modal and evidential adjectives, which have a slightly higher ratio in Subcorpus B. A possible reason for this peculiarity of the adjectives lies in that they do not explicitly present the epistemic-modal or evidential qualification as stemming from the mind of the speaker/writer (i.e. the journalist in W occurrences or the cited source of the attributed speech in DRS or AS occurrences), but as easily shared by readers. Along these lines, epistemic-modal adjectives are characterised as 'objective' ( Halliday and Matthiessen, 2014 ), pp. 688-689) or as 'intersubjective' in Nuyts (2001Nuyts ( , 2017. This use of intersubjective expressions might be due to the assumption that by that time readers already had medical knowledge on the pandemic and were therefore ready to share epistemic qualifications that seemed reasonable. Let us consider (21), (21) France's prime minister, Jean Castex, hinted at the same concern, saying on Thursday that making vaccination compulsory would not be helpful because it was likely ⟨EPI, ADJ, AS ⟩ ultimately to create more problems than solutions. (GJAN22-024) where the consideration of making vaccination compulsory as a likely source of more problems than solutions is presented as common sense and predictably shared by readers, who had already witnessed previous vaccination campaigns, rather than a personal estimation of Jean Castex's mind. The distribution of the three subtypes of authorial voice throughout epistemic expressions is specified in Table 3 . It must be noted that this distinction is not applicable to the non-first person occurrences of the epistemic-modal lexical verbs ( think, believe and know) , which necessarily attribute the epistemic qualification to a different voice from that of the journalist.
As the table shows, the total number of expressions of the three subtypes is higher in Subcorpus A than in Subcorpus B. The percentages show that the number of expressions communicating the writer's authorial voice is proportionally almost the same; by contrast, direct reported speech is proportionally more common in Subcorpus B, while non-direct reported speech is more frequent in Subcorpus A. Considering that the total number of direct reported speech occurrences is still higher in Subcorpus A, the quantitative difference in occurrences of non-direct attributed speech may be explained by the quantitative difference in the occurrences of modal auxiliaries, for which this subtype of voice is most frequent in the two subcorpora. It is likely that, given the pervasive state of uncertainty in April 2020, the proportion of modalised statements in comparison to unmodalised statements expressing actual facts is larger in Subcorpus A than in Subcorpus B; in this context, journalists often communicated tentative opinions of experts, political authorities, reports or scientific studies by means of non-direct reported speech containing epistemic modal auxiliaries. An example is (22), which contains three occurrences of weak modal auxiliaries within the scope of nondirect reported speech: (22) The huge stock of 17.5 m antibody home testing kits ordered by the government after Boris Johnson said they could ⟨EPI, AS ⟩ be a "game changer " could ⟨EPI, AS ⟩ in fact be unreliable, scientists have said, saying that they may ⟨EPI, AS ⟩ fail to detect up to half of coronavirus cases. (GAPR20-019).
As stated above, Sections 5.2 to 5.7 present and discuss the results of epistemic expressions of the different subtypes. Chi-square with six degrees of freedom = 22.0551; p = 0.0011.

Epistemic-modal auxiliaries
The raw frequencies and the ratio per thousand words of the epistemic-modal auxiliaries in the two subcorpora are specified in Table 4 . The numbers show that the distribution of occurrences across modal auxiliaries and authorial voices is quite homogeneous across both subcorpora.
The most common modal auxiliary is could , followed by may and might , while must is rare. As for authorial voice, AS is by far the most common category, as was mentioned in Section 5.1 . This high frequency, together with the DRS occurrences, indicate that, in most cases, journalists tend to embed the auxiliaries and the modalised propositions in attributed discourse rather than present the modal qualifications as their own. However, a difference may be found between the two subcorpora, in that the second most frequent category is W in Subcorpus A and DRS in Subcorpus B. That is to say, in Subcorpus A writers of articles are more prone to use modal auxiliaries for formulating epistemic assessments themselves, not always leaving the task to cited sources.
The difference is most remarkable for might , where the W occurrences in Subcorpus A more than double those of Subcorpus B. An illustrative example is (23), extracted from an article about the UK's options for lifting the lockdown, which contains a W occurrence of might and another of could : (23) Those who test positive for coronavirus antibodies will presumably have some immunity and in principle might ⟨EPI, W ⟩ be allowed back to work. It could ⟨EPI, W ⟩ make a dramatic difference for NHS staff and other carers who work with vulnerable people. (GAPR20-035).
We may interpret that, in this example, the modals not only communicate epistemic-modal qualifications, but also, from the pragmatic viewpoint, communicate the conversational implicature that the activation by the Government of the possibility expressed by the mightutterance would be welcome, given the possible consequences envisaged in the could -utterance.

Epistemic-modal and evidential adverbs
The quantitative results for epistemic-modal and evidential adverbs, specified in Tables 5 and 6 , show that they are much less common than the modal auxiliaries. The adverbs are also more frequent in Subcorpus A than in Subcorpus B, the only exceptions being likely and allegedly , while reportedly occurs with a very similar frequency in the two subcorpora.
The absolute frequency of the epistemic-modal adverbs in Subcorpus A more than doubles that of Subcorpus B. This quantitative difference is especially strong for the adverbs of lowest probability perhaps, maybe and possibly , where the occurrences total 71 and 22 respectively. However, the difference is also remarkable for the certainty adverbs certainly and surely , and also for the strong evidential adverbs clearly and obviously . This quantitative difference disconfirms Hypothesis 2, and its reason plausibly lies in the need to clearly signal what was certain within a situation of uncertainty. Sometimes these certainties were desirable, as in (24), a DRS occurrence attributed to an infectious disease specialist; in other cases, they were undesirable, as in (25), extracted from an editorial on The Observer 's view on the coronavirus crisis in the UK: (24) "But certainly ⟨EPI, DRS ⟩ when we start [the] drug, we see fever curves falling, " she said. (GAPR20-093). (25) But the death toll at this point is surely ⟨EPI, W ⟩ higher than it needed to be. Why, when we had longer to prepare than countries such as Spain and Italy, are our daily tolls higher than the figures at what looks ⟨EVI, W ⟩ to be their peak -even though we are still thought ⟨EPI, 1p, W, pres ⟩ to be at least two weeks away from ours? (GAPR20-048).
Concerning authorial voice, W-occurrences are more common in Subcorpus A. This quantitative difference shows that in this subcorpus writers of articles use these adverbs more frequently to make guesses about possible events about the pandemic, aiming to contribute to readers' opinion formation. In particular, W-occurrences of perhaps, maybe  The evidential adverbs apparently, allegedly and reportedly are nearly always W-occurrences, which is not surprising, since their very meaning encodes the grounding of the evidential assessments on concrete types of external evidence, namely appearances in the case of apparently and linguistic messages in the case of allegedly and reportedly .

Epistemic-modal and evidential adjectives
The quantitative data on epistemic-modal and evidential adjectives are specified in Tables 7 and 8 . The overall ratio is slightly higher in Subcorpus B: a possible reason was the 'objectivity' or 'intersubjectivity' inherent to adjectives; as was stated in Section 5.1 , this intersubjectivity hints the assumption that readers were already knowledgeable about the pandemic, thus being prone to accept epistemic qualifications that sounded reasonable.
The difference in the ratio is accounted for by likely , by far the most common of all the adjectives, and also by unclear , sometimes used in Subcorpus B to refer to issues that, by the time the articles were written, should have been clarified due to previous experience with the pandemic and its effects, but were not: in these cases, unclear sometimes occurs with the verb remain or the adverb still , as in (27): (27) It is still unclear ⟨EVI, W ⟩ whether Beijing will avoid an outbreak such as in nearby Tianjin in the next few weeks. Instead, it once again puts China's zero-tolerance Covid containment strategy under a renewed international spotlight. (GJAN22-052).

Epistemic-modal lexical verbs
The frequency of epistemic-modal lexical verbs has been registered for each verb in two separate tables, one for the first person and another Table 9 Frequency of the epistemic-modal lexical verb think in the first person in the two subcorpora.

Table 10
Frequency of the epistemic-modal lexical verb believe in the first person in the two subcorpora. for the rest of the persons. As for the verbs of belief, the occurrences of think and believe in the first person are specified in Tables 9 and 10 . The analysis is restricted to the Simple Present and the Simple Past: there are other forms that occur very occasionally, such as the Present Perfect and the Past Perfect, which will not be analysed for reasons of space.
The results show that the first person occurrences are more common with think than with believe . The most common expression is by far I think in DRS, where it is used to quote beliefs by other persons, mostly politicians or medical experts, as in (28), attributed to Paul Hunter, a professor of medicine in the University of East Anglia.
(28) "Taking all this together, I don't think ⟨EPI, 1 s, DRS, pres ⟩ we have peaked yet, but I think ⟨EPI, 1 s, DRS, pres ⟩ we are not that far away -or at least I hope so, " he added. (GJAN22-018).
The two verbs differ in that I think is more frequent in Subcorpus A while I believe is more common in Subcorpus B, where it is sometimes used as a hedge: in these cases, the main function of this expression is not to assess probability but to avoid sounding too authoritative, as in (29): (29) He told MPs: "While vaccination remains our very best line of defence, I believe ⟨EPI, 1 s, DRS, pres ⟩ it is no longer proportionate to require vaccination as a condition of deployment by statute. " (GFEB22-002).
Think and believe in the first person do not occur in non-direct attributed discourse (it is grammatically impossible) and only 9 times in W-occurrences: it may be said then that writers of articles rarely make explicit their role as formulators of epistemic qualifications. Remarkably, all but one of these occurrences belong to Subcorpus A, which, together with the results for epistemic-modal auxiliaries and adverbs (see Sections 5.2 and 5.3 ), indicates a stronger tendency in this subcorpus for journalists to use epistemic-modal expressions as a way to express their own epistemic qualifications. An occurrence which illustrates the expression of the writer's viewpoint is example (30), extracted from an article authored by the prestigious journalist Simon Jenkins, published on 2 April 2020 and significantly titled "Was I wrong about coronavirus?
Even the world's best scientists can't tell me ". This extract, which communicates his (unfortunately mistaken) beliefs about the evolution of the pandemic at that moment contrasting them with those of his wife, contains two W-occurrences of I think : (30) My wife and I share inputs, hear the same news and read the same papers. But I am an optimist and she is a pessimist. I think ⟨EPI, 1 s, W, pres ⟩ we could have stuck to the Swedish model. I think ⟨EPI, 1 s, W, pres ⟩ the crisis will be over in three weeks. She believes ⟨EPI, 3, pres ⟩ it will last months. It is not much comfort that we both have scientists on our side. (GAPR20-001).
The non-first-person occurrences of think and believe are illustrated in Table 11 . Following the overall tendency, they are more common in Subcorpus A than in Subcorpus B. Their distribution differs in that in this case believe is more common than think .
As for the verb know , Tables 12 and 13 cover the first person occurrences and the rest of the occurrences, respectively, making the distinction between affirmative and negative occurrences in order to distinguish between knowledge and lack of knowledge.
Again, the occurrences of all the subtypes are more frequent in Subcorpus A than in Subcorpus B. This higher frequency holds not only for negative occurrences, but also for affirmative occurrences even if the existing body of knowledge about the coronavirus was substantially larger when Subcorpus B was collected. This difference, which disconfirms Hypothesis 2 just as adverbs of certainty did (see Section 5.3 ), may well be accounted for by the need to clarify what was known and what was not known at the disconcerting time in which the articles collected in Subcorpus A were written. In particular, the difference is outstanding in first person plural occurrences in present and past tense, which total 42 occurrences in Subcorpus A and 24 in Subcorpus B. These cases concern the existing (lack of) knowledge available to bodies of experts or to mankind in general. This contrast between knowledge and lack of knowledge is clearly manifested in example (31), extracted from an article written by epidemiologist Dr Kathryn Snow: (31) All modelling involves uncertainty. There are still many important things we don't know about the virus. There are also things we don't know about our interventions. For example, we know ⟨EPI, 1p, W, pres ⟩ that physical distancing is working, but we don't know ⟨EPI, 1p, W, pres, neg ⟩ which parts of it are the most effective. (GAPR20-004). Table 14 specifies the number of occurrences of the evidential verbs of appearance. Again, the verbs are more common in Subcorpus A than in Subcorpus B. Most of the cases are W-occurrences, which is not surprising since the lexical meanings of these verbs already have a component of external evidence, concretely appearances.

Evidential verbs of appearance
Curiously, the quantitative difference is remarkable for appear and look while the frequency of seem is similar in the two subcorpora. A

Table 11
Frequency of epistemic-modal think and believe in non-first-person occurrences in the two subcorpora.    possible explanation, which deserves further research, is that, although both seem and appear can be used as face-saving strategies, seem is perhaps more frequent with this use while appear tends to be straightforwardly used to communicate evidential qualifications. Then, it may be tentatively suggested that the need for face saving is probably similar in the two subcorpora, while the need for evidential qualifications is greater for Subcorpus A. Example (32) is a face-saving example of seem from Subcorpus B, (32) Other things that I have learned is that if you mention vaccination in the media, particularly vaccination of children, then there is likely ⟨EPI, ADJ, W ⟩ to be a reaction. However, this only occurs if one's comments are picked up by the rightwing press -particularly the Daily Mail. The letter-writers also seem ⟨EVI, W ⟩ to use the same tropes, like the phrase "tick tock " (sometimes just that) or the Nuremberg trials and, of course, Bill Gates (who must ⟨EPI, W ⟩ get this stuff by the bucketload every day). (GJAN22-003) where the use of this verb is not due to lack of total commitment and basis on appearances (the journalist had direct access to the letters in question), but to hedge the unfavourable comment about the letter-writers.

Results and discussion: summary
The results of the quantitative analysis and the corresponding discussion may be summarised as follows: -The most frequent expressions used for communicating epistemic qualifications are epistemic-modal auxiliaries, followed by epistemic-modal lexical verbs. -The number of expressions of epistemicity is larger for all types of expressions in Subcorpus A than in Subcorpus B, with the exception of epistemic-modal and evidential adjectives. This quantitative difference occurs not only in expressions that weaken commitment to the information transmitted, but also in expressions that strengthen this commitment such as the adverbs certainly, surely, clearly and obviously and affirmative occurrences of the verb know . The higher frequency of these expressions in Subcorpus A may well be interpreted as stemming from the need to signal knowledge or certainty with emphasis at the beginning of the pandemic, where uncertainty was pervasive. -The writers of the articles rarely cited themselves as formulators of epistemic-modal qualifications (in other words, first person Woccurrences of I think, I believe and I know are rare). By contrast, Woccurrences are common with evidentials. Then it seems that, when writers of articles formulate epistemic assessments, they prefer to disguise that the formulations are theirs and to highlight that they are based on external evidence. -Subcorpus A contains proportionally more AS occurrences, especially of modal auxiliaries, which served journalists to communicate tentative opinions of authorities or experts in the uncertain time when the articles were written. It also contains more W-occurrences than Subcorpus B, the differences being remarkable for epistemicmodal auxiliaries and adverbs and for I think . This relatively higher quantity of epistemic assessments voiced by the journalists themselves may be interpreted as a greater effort on their part to orientate readers' perception of the pandemic.

Social implications and pedagogical applications
It may well be considered that the research presented in the previous sections of the paper has a number of social implications and pedagogical applications, approached in Sections 6.1 and 6.2 respectively.

Social implications
The social implications of this paper derive from several reasons. To start with, the expressions of epistemicity found in the two subcorpora of The Guardian prove that this journal, according to its status of quality press, has reflected the higher degree of uncertainty concerning the pandemic in April 2020 in contrast to the later time of January-April 2022. The higher number of epistemic expressions of all degrees of strength in the first subcorpus reflects the effort made by journalists and writers of opinion articles to carefully assess what was known, unknown, possible, likely or apparent, and to transmit these assessments to readers, within a general state of chaos and confusion. However, the difference in the number of epistemic expressions in the two subcorpora is moderate, not striking, for the reason that many doubts about the pandemic still remained in the first months of 2022.
Secondly, the analysis presented here may be considered to be of general interest. An overview of these careful epistemic assessments made in The Guardian as a quality newspaper might well contribute to increase readers' carefulness about the choice of media from which they obtain information. In this respect, we cannot but think about the proliferation and wide spread of fake news on the pandemic (which reportedly played a major role in anti-vaccine campaigns), despite the efforts of quality media including The Guardian to warn readers against them. I believe that the paper in its present form is accessible not only to linguists, but to all kinds of readers with a high level of literacy. In particular, researchers or practitioners in social sciences such as psychology, sociology or political science might well find the results useful for their work.
The paper could also be adapted for those readers less used to academic discourse, by making the style more colloquial, decreasing lexical density and replacing specialised terms with expressions of more general use. In this way, a wide range of readers could benefit from an unhurried reading of the epistemic assessments analysed, which would give them an opportunity to reflect on the distinction between the known, the possible and the false, now that we constantly hear that these times are an era of "post-truth ", a term defined by the Oxford English Dictionary as "[r]elating to or denoting circumstances in which objective facts are less influential in shaping political debate or public opinion than appeals to emotion and personal belief ". 9 Thirdly, the epistemic assessments analysed here also imply a consideration of readers as mature, capable of understanding the health crisis and struggling to develop the required tolerance to uncertainty. This attitude may be perceived in many of the examples cited in previous sections, which unashamedly presented the state of things as they were. Given the complexity of the situation, there was no room for false promises or easy solutions.
Fourthly, the higher number of occurrences where the writer of the article is the source of the epistemic qualifications (W-occurrences) in the April-2020 corpus, especially in the cases of epistemic-modal auxiliaries, adverbs and lexical verbs, also reflects writers' special effort to have an influence on readers' opinions in this difficult time, for example by making predictions about possible consequences of scientific advances or governments' policies.
Fifthly, last but not least, the comparative analysis of two subcorpora belonging to different stages of the pandemic gives us the opportunity to perceive in more depth the quick progress of medical knowledge as the months went by. This progress was out of reach in previous worldwide pandemics, such as the 1918 influenza pandemic, which originated in Kansas, USA but was popularly known as the Spanish flu. We must thank all researchers who have made this progress possible, as well as the myriad of health professionals all over the world who faced uncertainty in the front line at the beginning of the pandemic, often at the cost of their lives or long-term health problems.

Pedagogical applications
The social implications stated above, together with the topic of the paper and the step-by-step account of the research, lead us to consider that the pedagogical applications of this paper are manifold. The most obvious teaching contexts are university courses on areas such as English linguistics, English in the media or discourse/text analysis. In this kind of course, the paper could be used in its present form, or else the students might do qualitative or quantitative analyses of (some of) the epistemic expressions of part of the texts included in the corpus, or of similar texts. These activities would be useful for familiarising students with the diverse expressions of epistemicity in English and, more generally, for learning to do text analysis and raising critical reading awareness.
The paper could also be used in teaching contexts of different levels and areas of knowledge. Concerning level, this research could serve as a basis for designing exercises or even projects for non-university courses of English for adults or teenagers. The students would benefit from the advantages stated below for university students, to the extent allowed by their level. As for other areas of knowledge, this research is of interest for lecturers in many areas within social sciences, such as journalism, contemporary history, psychology, sociology or political science.

Conclusions and suggestions for further research
This paper has set forth a quantitative analysis of a wide range of expressions of epistemicity in a corpus of 400 newspaper articles from The Guardian , divided into two subcorpora belonging to different stages of the pandemic, April 2020 and January-April 2022. The analysis has shown that epistemic expressions are significantly more common in the first subcorpus, plausibly due to a stronger social need at that time to distinguish facts from probabilities, along with degrees of certainty and assessments on the basis of evidence. In particular, this subcorpus has many more occurrences of epistemic modal auxiliaries within the scope of non-direct attributed speech, by which journalists attributed epistemic judgements to experts, political authorities or documents such as reports or studies. This earlier subcorpus also displays a higher number of epistemic expressions belonging to the authorial voice of the writers of the articles, which hints that, at that time of uncertainty, these writers made a greater effort to have an influence on readers' perceptions of the pandemic.
This research has social implications, since it provides readers (not only linguists) with an opportunity to appreciate the carefully assessed epistemic qualifications found in the corpus, which might lead them to acknowledge the status of The Guardian and quality press in general in a time of proliferation of questionable social media as a breeding ground for fake news. It also has pedagogical applications: the paper could be used as didactic material, either in its present form or in a simplified version; the findings could be discussed, or else students themselves may carry out analyses of epistemic expressions of selected texts from this corpus or of similar texts. These kinds of activities would give students opportunities to get familiarised with the expression of epistemicity in English and to develop skills for text analysis and critical reading.
Suggestions for further research include similar analyses of epistemic expressions in other mass media, such as right-leaning quality press, media in other formats and fake news detected by fact-checkers. The research could also be extended to mass media discourse produced in other countries and languages. In all these cases, the resulting comparative analyses would be of interest from the didactic point of view. Last but not least, further research could include later stages of the pandemic, which, hopefully, will witness a favourable evolution or even its end.

Declaration of Competing Interest
The author declares that she has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.