skip to main content
10.1145/3613904.3642076acmconferencesArticle/Chapter ViewFull TextPublication PageschiConference Proceedingsconference-collections
research-article
Open Access
Artifacts Available / v1.1

The Effects of Perceived AI Use On Content Perceptions

Published:11 May 2024Publication History

Abstract

There is a potential future where the content created by a human and an AI are indistinguishable. In this future, if you can’t tell the difference, does it matter? We conducted a 3 (Assigned creator: human, human with AI assistance, AI) by 4 (Context: news, travel, health, and jokes) mixed-design experiment where participants evaluated human-written content that was presented as created by a human, a human with AI assistance, or an AI. We found that participants felt more negatively about the content creator and were less satisfied when they thought AI was used, but assigned creator had no effect on content judgments. We also identified five interpretations for how participants thought AI use affected the content creation process. Our work suggests that informing users about AI use may not have the intended effect of helping consumers make content judgments and may instead damage the relationship between creators and followers.

Skip 1INTRODUCTION Section

1 INTRODUCTION

The release of artificial intelligence (AI) technologies, such as ChatGPT [2], Dall-E [3], Bard [4], and Midjourney [5] to the general public has led to a rapid rise in the use of AI to make everything from science fiction stories [35] to video news deep fakes [50, 53]. Advances in these technologies are making it increasingly difficult to tell when AI has been used to generate content [23], raising concerns for two reasons.

First, these technologies can suffer from hallucinations—the generation of results that are factually incorrect, but are presented as true [56]. Second, these technologies make it easy to create convincing and misleading text, images, and video [52]. When people are unable to judge the credibility of information it can lead to serious consequences, such as real world deaths [42]. This danger is likely to increase as the use of these technologies becomes widespread and AI-generated or AI-assisted content proliferates throughout the internet. In response, users, experts, and regulators have increasingly demanded a way to identify when AI has been used [1, 13, 24, 47, 49, 53] under the assumption that this will help people to judge the credibility of the content they’re seeing. But will indicating the use of AI technologies have the intended effect?

The answer is unclear when looking at prior research. Findings about how users evaluate content credibility on the internet have demonstrated that people primarily rely on superficial or easily accessible signals [15, 16], particularly in situations where misinformation will be less damaging. In these studies, the actual creator is often less relevant to users than site design, sponsor (hosting organization or entity), navigation, and ease of use[15, 19, 20, 40, 43, 51, 57]. These findings suggest that if users were informed about AI use, they may not consider the information in their credibility evaluations.

In contrast, research on the acceptance and trust of algorithmic outputs has demonstrated a bias in favor of humans in a variety of recommendation and forecasting scenarios [10, 11, 12, 37, 59], even when the algorithm has been proven to be more accurate. These findings suggest that revealing the use of AI could result in users subjecting content to more scrutiny or skepticism, having the intended effect of nudging users to be more critical in their judgments. Knowing AI was used may also effect how content is received in other ways, such as leading to the belief that work was created with less intention [30], is less creative, deriving less enjoyment from it, or viewing it as less well-written [11, 20].

Recent studies have approached the problem by testing whether people are able to distinguish between content created by humans versus an AI [23, 32, 46]. Although this work showed that many times people could tell the difference between creators and viewed the AI-generated content more negatively [32, 46], it is unclear how these results may change when there is no longer a discernible difference. In other words, if you can’t tell, does it matter?

To answer this question, we ran a pre-experiment to develop stimuli, creating three versions of text content (without the use of AI) for five different contexts and tested for differences between versions. We then conducted a 3 (assigned creator (within): human only, human with AI assistance, AI-generated) x 4 (context (between): news, travel, health, jokes) mixed-design experiment with 1641 participants to understand how adding labels for AI use affected user perceptions. Participants rated each version individually on dimensions like originality, trustworthiness, presentation, satisfaction, and how much effort they thought went into creating it. They also chose between all three versions in context-specific scenarios to explore whether their perceptions matched their behaviors. Finally, we asked participants about what prompted their choices as well as what they thought "AI-generated" and "with AI assistance" meant for how the content was created.

We found that when the assigned creator was displayed as human (without AI), participants were significantly more satisfied, had more positive perceptions of the creator’s qualifications, and felt more effort was put into the content than when the assigned creator was displayed as using AI, independent of context. Contrary to our expectations, assigned creator had no effect on content judgments, such as originality, trustworthiness, or presentation, and only a marginal effect on choices between versions of health content. Through thematic analysis, we identified five main interpretations for what use of AI meant for the content creation process. These themes differed by when participants thought handoffs between the human and AI occurred and what existing technology they mapped their interpretation to.

Our work offers the following contributions:

(1)

We extend prior literature by exploring the effects that perceived use of AI has on the content itself, as well as how it reflects on the creator. With an increasing number of people turning to social media as an information source [7, 18, 55] and the growth of the creator economy [48], our results have implications for whether creators may want their use of AI technologies to be revealed.

(2)

We combine previous approaches to empirically test how assigned creator affects both individual content evaluations and content comparisons.

(3)

We extend our understanding of how users interpret information coming from AI sources by exploring scenarios where humans are assisted by AI or make use of AI tools.

(4)

We believe we provide some of the first insights into what users think AI assistance or AI generation means for how content is created and the way it informs their perceptions and decisions.

Skip 2RELATED WORK Section

2 RELATED WORK

A large percentage of the global population uses the internet [6] as a source of information, news, entertainment, and more. However, the availability and openness of the internet mean there are no restrictions to the content that can be posted, regardless of how accurate or biased the information is. Much of past research has therefore focused on how users evaluate the credibility or trustworthiness of internet content.

2.1 Content credibility and trustworthiness

Early approaches to content credibility relied on experts to define a list of criteria that users should consider in their evaluations, such as accuracy, objectivity, authority, currency, and coverage [21]. Since then, multiple studies have shown that a mismatch exists between the signals users know they should be looking at and what they actually rely on [15, 16]. For example, users were found to base their credibility judgments on low effort indicators, such as how professional the site appears, self-proclaimed expertise (example: awards listed on site), the reputation of the organization or entity hosting the content, the URL (example:.org,.edu), how clearly the content is presented, and ease of use or navigation [15, 16, 19, 20, 40, 41, 43, 51, 57]. Some of these studies have also suggested that users take different approaches depending on their context or motivation; for example, relying on more superficial signals when their motivation or sense of risk is low. [14, 18, 25].

Although the creator identity may be used as part of user evaluations [19, 43, 51], the sponsor organization is relied on more frequently because users often already have familiarity with them and don’t need to expend effort on verification (versus taking the time to look up an individual’s credentials) [20, 41, 57]. These findings are echoed by work demonstrating that users rely on signals that are outside of the content to judge credibility, such as comparison with other sites and offline sources or corroboration of information across multiple sources [18, 39, 40, 41]. In social media settings and among younger users, external social cues have been leveraged to establish credibility in four ways: (1) conferred - when credibility is bestowed by a well regarded source; (2) tabulated - aggregated across peer ratings; (3) reputed - endorsed through personal and social networks; and (4) emergent - arising from a pool of resources and achieved through a system of open access [18, 41].

But with so many disparate portrayals of AI in the media, will a machine-based creator identity matter more and if so, how might it impact user perceptions?

2.1.1 Effects of machine-based sources on credibility and trust.

A majority of the work exploring acceptance of algorithmic or machine-generated content has found that participants demonstrated algorithmic aversion—a negative bias in behaviors and attitudes [30, 31] toward algorithmically sourced content (versus human sourced content), even in situations where the algorithm or machine output was objectively better [8]. For example, aligning predictions less with algorithmic suggestions than another human’s when forecasting [12] or making joke recommendations [58]. Participants also differed in the way they reacted to algorithmic versus human errors, penalizing algorithms but not humans [12]. Algorithmic aversion was consistently observed in both objective situations [10]—such as evaluating the trustworthiness of news headlines [38], trusting an expert human advisor over an algorithmic or novice advisor for financial predictions [59], and choosing a human doctor over a computer for medical analyses, diagnoses, and treatments [37]—as well as subjective contexts—such as movie recommendations [10] or choosing products with an emotional [22] or symbolic value [29].

In contrast, Logg et al., found that participants exhibited algorithmic appreciation, adhering more to advice from an algorithm than that of a human [36], and Graefe et al. showed that articles with a computer creator were rated as more credible and higher in journalistic expertise [28]. These experiments also highlighted that although a computer creator did not affect most participants, experts were less receptive to algorithmic advice [36] and viewed it as less trustworthy [54].

2.2 Beyond content credibility

In addition to using the internet to find information, users may also consume or engage with highly subjective or opinionated content to find diversion, express their identity, establish relationships [7, 25, 27], or get money and attention [26]. As a result, the credibility or trustworthiness of content may not always be the most important dimension to consider.

For example, when content is being consumed for diversion or entertainment, how enjoyable, well presented, or creative the work is [11, 20, 34, 54] may be more important than the work being credible or trustworthy. When content engagement is part of identity expression, establishing relationships, or leads to actions that involve exchanges of monetary or social capital, perceptions of the creator and their process—such as how engaged they are, how much effort they put in, or their intentions or motivations in making the content—may matter more [22, 25, 29].

Few studies have examined the effects that a human versus a computer creator might have on dimensions beyond content credibility and their results have been mixed. Ragot et al. found that participants thought AI-created paintings were less beautiful, novel, meaningful, and likeable than human-created paintings [46]. Kobis et al. demonstrated users expressing a preference for poetry by human creators [32], and Clerwall found that news articles with human creators were considered more pleasant to read [11]. However, other studies showed no effects of an algorithmic/computer creator on perceptions of objectivity or presentation [11, 34, 54]. Some explanations that have been suggested for the variability between results are contextual, such as the investment in emotional [22] or symbolic value [29] or the amount of risk users believe they are undertaking [16, 40].

To summarize, the effect that perceived use of AI will have on judgments of credibility, enjoyment, originality, or presentation as well as judgments of the creator themselves remains an open question. Findings that creator identity is a less relevant signal than other contextual or superficial cues suggests that knowing AI was used may not influence content judgments. On the other hand, work on algorithmic trust largely implies that AI use will trigger algorithmic aversion, leading to negative reactions. In both bodies of work, results have varied and there has been some suggestion that seemingly contradictory findings may have been caused by contextual differences. For example, the difference in risk, emotional investment, or intention when looking at news articles versus a joke.

We therefore hypothesized that:

(1)

People will use assigned creator as a signal to evaluate content more positively when they believe a human was involved in the creation process and more negatively when it appears to have been generated by an AI.

(2)

The negative effects of perceived AI use will be stronger in contexts that are considered more objective and high risk than those that are subjective and low risk. More concretely, AI use will have the most negative impact for news > health > travel > jokes [10].

To test these hypotheses, we first developed our stimuli in a pre-experiment, creating three versions of text content (without the use of AI) for five contexts: travel, news, recipes, health, and jokes and tested them to validate there were no differences between versions. We then conducted a 4 (context, between-subjects) x 3 (assigned creator, within-subjects) mixed-design experiment with 1641 participants where we randomly presented the creator of each version as a human, a human with AI assistance, or as an AI. We next describe the design of our experiment and results.

Skip 3PRE-EXPERIMENT: STIMULI DEVELOPMENT Section

3 PRE-EXPERIMENT: STIMULI DEVELOPMENT

Prior work in how users make content credibility judgments has suggested that strategies differ depending on the context and the level of consequentiality that misinformation is perceived to have [16, 40]. Informed by this work, we chose five contexts we believed differed by level of risk, emotional investment, and intention: news, health, travel, recipes, and jokes [10, 17, 20, 29, 38, 58].

Table 1:
ContextVersion 1Version 2Version 3
Banana Applesauce BreadCinnamon Banana BreadVery Banana Bread
JokesThe PirateDave Knows EveryonePlanting Tomatoes

News

Earthquake Rocks

Hurricane Flora Makes

Forest Fire Spreads

Travel

Museum of ScienceExploratoriumScience Center

Health

LanguishingAnxietyBurnout

Table 1: Titles of each version by context.

To create our versions, we started with existing posts from Reuters, the New York Times, AllRecipes, Reddit, and TripAdvisor, then copied their format or edited them to create material that was similar in length, language, level of detail, and context. For example, we looked at news articles about natural disasters posted to Reuters and the New York Times and wrote three news articles following the patterns of:

The title being a disaster and location.

Included a quote from an official that suggested an action for the reader to take.

Included at least one statistic related to disaster severity.

Provided a detail about the area of effect.

We made similar edits to recipes, jokes, mental health articles, and travel location descriptions. The title of each version is shown in Table 1 and a full example is shown in Table 2. The full versions of content for each context is available in our supplementary materials, Section A.

3.1 Measures

We designed our pre-experiment to collect data in three ways.

3.1.1 Individual content evaluations.

Participants first performed individual evaluations of each version to understand how perceptions were affected in isolation. These individual evaluations are similar to what a user might experience if they followed a direct link to content on the internet. Because the aspects that users value may differ between contexts, we chose five dimensions for participants to evaluate across all contexts: (1) originality; (2) presentation; (3) effort going into creating it; (4) how qualified the creator was to have created it; and (5) overall satisfaction. We asked about how informative and trustworthy the content was for all contexts except for jokes. For jokes we asked participants to evaluate how enjoyable each version was. Evaluations were on a 5-point unipolar Likert scale (example: 1 = Not at all qualified, 5 = Extremely qualified).

3.1.2 Version comparisons.

After evaluating each version individually, participants saw all three version at the same time and were asked to choose between them in three context-specific questions, for example, "If all of these were breaking news stories for today, which would you be most likely to suggest to someone else who wanted to know about the event(s)?" Participants could select from the three versions or choose "None." We designed the first two choice questions to be comparisons of the content, where the choice was not exclusionary. For example, "Which news article did you think was the most factual?" We designed the third question to mimic a behavioral choice that implied the exclusion of the other choices. For example, choosing which recipe to try, which location to go to, or which article to share. A full list of comparison questions is available in our supplementary materials under Section C.

3.2 Participants

The survey was administered using Qualtrics and we recruited 310 participants (representative of US gender and age distribution) for each context using an anonymous Cint survey panel for a total of 1550 responses. Participants took an average of 6 minutes to complete the survey and were compensated $2.30. After data scrubbing, we were left with 1197 valid responses (n = 1197): news (n = 217), travel (n = 246), health (n = 251), jokes (n = 239), and recipes (n = 244). Responses were excluded if they met one of the following criteria:

Obvious gibberish in the open-ended response.

Response time < 60 seconds and the open-text response has no detail (example: "None").

Mismatch of having chosen a version but not providing an open-text response.

Open-text responses that did not make sense as a factor for choosing a version (example: "Canada").

Table 2:
Museum of ScienceThe Museum of Science features over 600 interactive and informative exhibits centering around math, engineering, biodiversity, and more. Whether you’re interested in learning about dinosaurs, space, or the human body, you’re sure to find something to interest you at the Museum of Science.Watch educational shows and live presentations at the planetarium, one of the largest in the world. Experience cool 4D movies on the domed IMAX. The museum is open seven days a week and admission is free for children under two years old. Tickets to the museum include access to films and planetarium shows at a discounted rate.Exploratorium The Exploratorium is more than a museum—it’s a gateway to exploring science, art, and human perception with every visit. Let your curiosity roam through more than 600 interactive exhibits in the amazing realms of physics, anatomy, human perception, and Earth’s ecosystems.Learn through hundreds of indoor and outdoor interactive exhibits, including the Tactile Dome and planetarium. Enjoy the vibrant bayside energy and waterfront pier. The Exploratorium is open Tuesday through Sunday and admission is free for children under three years old. Tickets include access to the Tactile Dome and planetarium shows at a discounted rate.The Science Center The Science Center is a world-class institute for stimulating curiosity featuring over 600 hands-on exhibits and inquiry-based experiences. Through fun, memorable experiences you’ll learn about human inventions and innovations, the life processes of living things, and space exploration. Experience other worlds on the IMAX and test your problem solving skills in our Discovery Rooms. Discover nearly 400 plant and animal species in our ecosystem pavilion. The Science Center is open daily and admission to our permanent galleries is free. Tickets are required for special exhibitions and the IMAX, but can be purchased together for a discounted rate.

Table 2: Example: The three versions of travel content without a source.

Responses were anonymous, but we collected gender, age, and self-reported frequency of looking at each context online based on prior findings that familiarity with a context affect perceptions of content credibility [17, 57] and age may play a role in the evaluation strategies used [18]. Table 3 shows an overview of the sample age distribution and Figure 1 shows the frequency of looking at that type of content online by context.

Figure 1:

Figure 1: Pre-experiment, frequency that participants reported looking at their context online.

3.3 Procedure

We ran five separate surveys, one for each context. After consent, participants were given a brief introduction to their context. For example, if they were in the travel context: "Imagine that you are going on a road trip to a new city. You’re looking for places to visit and your cousin said they’d like to bring their children to a science museum. Read the description of the three options and answer a series of questions to evaluate them." Participants evaluated each version of content in random order, then were shown all three versions together and asked to decide between them in context-specific questions. Last, participants were asked for basic demographic information of age, gender, and how frequently they looked at online content for each of the five contexts.

Table 3:
Context18-2425-3435-4445-5455-6465-7475+
8.8%14.3%15.7%18.4%24.9%14.7%3.2%
Travel10.6%18.3%17.1%16.7%21.1%13.0%3.3%
Health13.5%15.5%14.3%19.5%16.7%16.7%3.6%
Recipes11.1%16.8%14.3%19.3%18.0%17.2%3.3%
Jokes10.7%15.8%18.1%20.0%22.8%16.7%6.5%

Table 3: Distribution of respondents for pre-experiment by age.

3.4 Analysis

We ran a linear regression to test gender, age, and frequency of looking at the context as possible covariates and found that frequency of looking at the context (e.g., frequency of viewing jokes online for jokes, frequency of looking at recipes online for recipes) and age were consistently significant predictors. To understand the directionality of these effects, we ran a secondary analysis of a repeated measures Analysis of Variance (ANOVA) using each dimension as the dependent variable and either age or frequency of looking at the context as the independent variable. For post-hoc analyses, we used the Bonferroni correction. Overall, we found that younger age ranges made more favorable judgments when evaluating versions of health posts and recipes than those in the 65-74 age range and that higher frequency of viewing content in a context tended to result in more favorable judgments across all contexts. We found no strong interaction effects of age or frequency of looking at the context with version. We omit the detail of these results from this report because they were not part of our hypotheses, but provide a brief summary:

News. Ages 65-74 more favorably rated news articles than those in the 18-24 and 25-34 ranges. News articles were rated more favorably by participants who looked at news more vs less frequently.

Travel. Age had no significant effects, but location descriptions were rated more favorably by participants who looked at travel content more frequently vs less frequently.

Health. Participants in the 45-54 age range more favorably rated health articles than those in the 65-74 age range. Participants who looked at health articles more frequently had more favorable ratings than those who looked at health articles less frequently.

Recipes. Participants in the 18-24, 25-34, and 35-44 age ranges more favorably rated recipes than those in the 65-74 age range. Recipes were rated more favorably by participants who looked at recipes more vs less frequently.

Jokes. Age had no significant effects, but jokes were more favorably rated by participants who looked at jokes more vs less frequently.

Based on the consistent effects of age and frequency of looking at context content, we performed a one-way Analysis of Covariance (ANCOVA) with version as an independent variable (repeated), individual evaluations of each version on the dimensions described in the Measures section as the dependent variables, and used age and frequency of looking at context content as covariates. We used the Dunn-Šídák correction in post-hoc analyses.

3.5 Results

We found no significant differences between versions for most dimensions, as shown in Table 4.

Table 4:
TravelRecipesHealthNewsJokes
N/AN/AN/AN/Ap =.90
How informative it wasp =.53p =.34p =.86p =.84N/A
Originalityp =.87p =.63p =.39p =.24p =.21
Presentationp =.85p =.26p =.81p =.10p =.41
Trustworthinessp =.46p =.12p =.33p =.04*N/A
Effort put into contentp =.83p =.15p =.28p =.64p =.05*
Qualified to createp =.18p =.13p =.002**p =.74p =.69
Satisfactionp =.72p =.40p =.95p =.08p =.76

Table 4: Pre-experiment results showing significant differences between versions for news trustworthiness, effort going into the creation of jokes, and how qualified the creator was to create health articles. * = p < 0.05, ** = p < 0.01

3.5.1 News.

There was a marginal main effect of version on how satisfied participants were with the news articles \(F(2,428)=2.59, p =.08, \eta _{p}^{2} =.012\). Post hoc comparisons indicated that participants were more satisfied with the Hurricane article (M = 3.64, SE =.07) than the Forest Fire article (M = 3.44, SE =.07, p =.03). There was a significant main effect of version on how trustworthy participants found the news articles \(F(2,428)=3.06, p =.04, \eta _{p}^{2} =.013\), but post-hoc comparisons did not reveal significant differences.

3.5.2 Jokes.

There was a significant main effect of version on how much effort participants thought went into creating the jokes \(F(2,472)=3.06, p =.05, \eta _{p}^{2} =.013\). Post hoc comparisons showed that participants thought marginally more effort was put into the Planting Tomatoes joke (M = 3.26, SE =.07) than the Pirate joke (M = 3.12, SE =.07, p =.07).

3.5.3 Health.

There was a significant main effect of version on how qualified participants felt the creator was to write the health article, despite there being no creator listed \(F(2,496)=3.31, p =.002, \eta _{p}^{2} =.025\). Post hoc comparisons showed that participants thought the creator of the Anxiety article (M = 3.44, SE =.07) was more qualified than the creator of the Burnout article (M = 3.38, SE =.07, p =.05).

3.5.4 Choices between versions.

When choosing between versions, we saw consistent preferences in most contexts. A full table of percentages for each context is included in the Appendix, section D:

News: The Hurricane article was chosen as the most factual (\(41.7\%\)), as having the best coverage (\(40.7\%\)), and as the article participants would be most likely to suggest to others (\(38.7\%\)).

Travel: The Exploratorium description was chosen as the most helpful (\(37.0\%\)) and the location participants were most likely to go to (\(34.1\%\)), but versions were equally chosen as the most enjoyable (\(31-33\%\)).

Health: The Anxiety post was consistently chosen as the most valuable (\(42.6\%\)), approachable (\(44.8\%\)), and recommended (\(42.2\%\)).

Recipes: Although the Applesauce banana bread recipe was chosen as the easiest to make (\(45.3\%\)), the Cinnamon banana bread recipe was chosen as the most tasty (\(44.9\%\)) and the one they would be most likely to try (\(40.4\%\)).

Jokes: The Pirate joke was chosen as the most liked (\(38.1\%\)), funniest (\(38.5\%\)), and joke they were most likely to share (\(36.8\%\)).

3.6 Implications for experiment design

We found some main effects of version on perceptions of trustworthiness for news, effort for jokes, and how qualified participants thought the creator was for health. In the news context these perceptions aligned with participants choosing the Hurricane article as the most factual, having the best coverage, and the one they would be most likely to share. Similarly in the health context, participants felt the creator of the Anxiety article was more qualified and chose it as the most valuable, approachable, and what they would share with others. In the joke context, participants felt more effort was put into the Planting Tomatoes joke, but the Pirate joke was chosen as the most liked, funniest, and the joke they were most likely to share.

The inconsistency of these effects made it difficult to identify whether choice preferences were because of the perceived differences in version (e.g., was the Hurricane article chosen as more factual and as having better coverage because it was more trustworthy?) or because of outside factors, such as personal experience or relevance (e.g., living in a hurricane zone). In the health context, we hypothesized that personal familiarity may have had a strong effect, explaining differing perceptions of creator qualifications despite no creator being listed.

To explore the reasons behind these differences, we added individual versions of the choice questions to our experiment. We also asked participants to briefly explain their decision in the last question, which simulated a behavioral choice.

Considering that lack of consistent differences between versions in each context and small effect sizes, we considered the possibility that these differences may have stemmed from noise in the data or Type I errors. In our full experiment, we therefore tried to replicate our pre-experiment results by retesting for significant version effects. For any version effects that were replicated, we excluded those comparisons when testing for the effects of assigned creator and context. We also decreased the number of contexts we tested to reduce complexity, proceeding with the following:

Travel because we found no significant differences between versions.

News because it is one of the most well studied contexts for content credibility judgments.

Health because it has been suggested as an area where misinformation has higher risk.

Jokes because the context is highly subjective and low risk.

In the next section, we provide further detail on the changes we made for our full experiment and our analysis methods for the open-ended responses.

Skip 4MAIN EXPERIMENT: EFFECTS OF ASSIGNED CREATOR Section

4 MAIN EXPERIMENT: EFFECTS OF ASSIGNED CREATOR

Informed by our findings from our pre-experiment, we designed a 3 [Assigned creator (within): human, human assisted by AI, AI-generated] x 4 [Context (between): news, travel, health, jokes] mixed-design experiment where participants evaluated content that was presented as created by a human, by a human with AI assistance, or AI-generated. We used the same materials and format as the pre-experiment with the following additions:

Removed one context. We dropped Recipes from our experiment to reduce study complexity.

Added an assigned creator. Although all content was created by a human, we randomly assigned labels of "By [name]," "By [name] with AI assistance," and "AI-generated [content]" to each version so that participants saw one of each. We used gender neutral first names combined with the most common last names in the US. Figure 5 shows an example of what this looked like.

Individual versions of choice questions. As discussed in our pre-experiment, we added individual versions of choice questions to better understand the relationship between perceptions in isolation and choices between versions. For example, "If all of these were breaking news stories for today, how likely would you be to suggest this news article to someone else who wanted to know about the event(s)?" on a 5-point Likert scale (1 = Not at all likely, 5 = Extremely likely).

Open-ended question about why choice was made. We added an open-ended question asking participants to explain their final decision between versions to disentangle whether choices were made because of assigned creator, personal experience, or other signals.

Open-ended questions about AI meaning. Although many previous findings highlighted algorithmic aversion or appreciation, the beliefs and attitudes driving these behaviors has yet to be fully disentangled. We hypothesized that because AI technologies are relatively new, participants may not understand what AI use means for content creation and therefore may over-index on or discount it as a signal in their judgments. To explore how highlighting AI use in assigned creator may have influenced participant perceptions and decisions, we also asked participants what they thought AI use meant for how the content was created. For example:

"When a news article is presented as, "By John Doe with AI assistance," describe in 2-3 sentences what you think it means for how the content was created."

"When a news article is presented as, "AI-generated news article," describe in 2-3 sentences what you think it means for how the content was created."

Creator check. Prior work has suggested that the creator’s identity may not be used as much as other signals for judging online content [20, 41, 57]. We therefore added a 7-option multiple choice question about what differed between versions to explore how creator might be used in a natural setting, where it would be one of many possible signals.

Table 5:
Museum of ScienceBy Casey ScottThe Museum of Science features over 600 interactive and informative exhibits centering around math, engineering, biodiversity, and more...ExploratoriumBy Corey Brown with AI assistanceThe Exploratorium is more than a museum—it’s a gateway to exploring science, art, and human perception with every visit...The Science CenterAI-generated location descriptionThe Science Center is a world-class institute for stimulating curiosity featuring over 600 hands-on exhibits and inquiry-based experiences...

Table 5: Example: The three versions of travel content with the assigned creator added.

4.1 Participants

We recruited 550 participants for each context through an anonymous Cint panel for a total of 2200 responses. We recruited for a representative gender and age distribution and the survey was administered using Qualtrics. Because panels for the pre-experiment and experiment were anonymous, we were not able to ensure that there were no duplicate participants between rounds.

Participants took an average of 11.8 minutes to complete the survey and were compensated $2.80. We excluded responses using the same criteria as our pre-experiment, but increased the required response time to > 120 seconds to accommodate the additional questions. These removals resulted in 1641 valid responses (n = 1641), with 377-434 participants per context: news (n = 404), travel (n = 426), health (n = 377), jokes (n = 434). Table 6 shows an overview of the sample age distribution and Figure 2 (right) shows the self-reported frequency of looking at content for each context.

Table 6:
Context18-2425-3435-4445-5455-6465-7475+
9.0%18.7%15.4%17.9%19.9%15.7%3.5%
Travel10.1%16.0%16.5%17.2%21.2%16.0%2.8%
Health8.2%12.2%15.4%19.9%20.4%19.1%4.8%
Jokes8.1%16.2%17.1%17.8%17.4%17.8%5.3%

Table 6: Distribution of respondents for our main experiment by age.

Figure 2:

Figure 2: Main experiment: frequency that participants reported looking at their context online.

4.2 Procedure

We ran four separate surveys on Qualtrics, one for each context. After consent, participants were given the same context introduction as in our pre-experiment and evaluated each version in random order. A randomly selected assigned creator was added to each version so that each participant saw the following assigned creators once: "By [name]," "By [name] with AI assistance," or "AI-generated [content]." After evaluating each version, participants were reminded of all three versions and asked to decide between them in context-specific comparisons. Participants were then asked to describe what factored into their choice as well as what they thought AI assistance and AI-generated meant for the way the content was created. Finally, participants completed a multiple choice assigned creator check and provided the same demographic information as in the pre-experiment.

4.3 Analysis

Similar to our pre-experiment, we ran a linear regression to test gender, age, and frequency of looking at the context as possible covariates. We found that age and frequency of looking at context had similar effects as in our pre-experiment and therefore used them as covariates.

To retest for the effects of version, we performed a one-way Analysis of Covariance (ANCOVA) with version as the independent variable (repeated), individual dimension ratings as the dependent variables, and used age and frequency of looking at context content as covariates. We used the Dunn-Šídák correction in post-hoc analyses.

Figure 3:

Figure 3: Results showing significant effects of the assigned creator on satisfaction (left), creator qualification to create the content (middle), and creator effort (right).

4.3.1 Version effects from the pre-experiment.

In our pre-experiment, we found a significant main effect of version on how trustworthy participants found news, how qualified they thought the creator was to create the content for health, and how much effort they thought went into the jokes. When we retested for these effects in our main experiment, we found no support for the effects of version on news trustworthiness p =.24 or how qualified participants felt the creator was to write health articles p =.75. We found evidence that version still had a significant main effect on how much effort participants thought went into creating the joke \(F(2,1294)=3.93, p =.02, \eta _{p}^{2} =.006\). Post hoc comparisons showed that participants still felt that significantly more effort was put into the Tomatoes joke (M = 3.19, SE =.05) than the Pirate joke (M = 3.02, SE =.05, p =.04). They also felt that marginally more effort was put into the Tomatoes joke (M = 3.19, SE =.05) than the Dave joke (M = 3.03, SE =.05, p =.052). We therefore believe that most of the significant effects from our pre-experiment may have been a Type I error or caused by noise in the data. However, the replication of findings for effort in the Jokes context led us to exclude it from analysis of differences between assigned creator.

For testing the effects of assigned creator and context, we performed a one-way Analysis of Covariance (ANCOVA), using the Dunn-Šídák correction for post-hoc analysis. We used assigned creator and context as independent variables, individual version ratings as dependent variables, and participant age and frequency of looking at context content as covariates. To compare choices between assigned creator and context, we performed a Chi-square Goodness of Fit test.

For open-ended responses, we created separate codebooks for what factored into their choices and for what participants thought AI assistance or AI-generated meant.

4.3.2 Codes for choice decisions.

Using an iterative coding process, we developed a codebook of seven main themes for the reasons participants gave for choosing a particular version (participant IDs use the first character to denote condition: h=health, t=travel, n=news, j=joke):

Assigned creator: related to the creator’s human identity or use of AI. For example, "AI articles were not considered in my decision"–h187

Personal experience/relevance: related to the participant’s experiences, interest in the context, or sense of humor. For example, "I suffer from anxiety so I can relate better to the explanation."–h109

Content detail: how much information participants thought the content had or particular details they felt were included. For example, "Gave the most detail" –n67

Presentation: perceived writing quality or style. For example, "I like the way it was presented and sounds like a good time."–t20

Credibility: how factual, reliable, or truthful the content was perceived to be. For example, "The facts and descriptions were more believable"–n281

Share-ability (Jokes only): how easy they believed the joke would be to share with others. For example, "How easy it is to tell and if I can remember it."–j369

N/A: comments that were not related to reasoning or did not specify a single version. For example, "They are all good."–h263

In an interrater reliability analysis using Cohen’s kappa over a \(10\%\) overlap of responses, we found substantial agreement between raters, Kappa = 0.67(p <.001).

4.3.3 Codes for AI use.

For responses to what AI-generated and AI assistance meant for the content creation process, we defined five main themes, each with between 2-12 sub-themes, for a total of 27 codes. Examples of themes and sub-themes include:

No human involvement: fully created using AI, AI created using online information

Human in the loop: human prompt and AI written, AI prompt and human written, human written with AI editing, AI written with human editing

Non-computer: aggregated from multiple sources, human collaboration

Comments about AI: more factual, less factual, higher quality, lower quality, less original

N/A: not categorizable, don’t know

We performed an interrater reliability analysis using Cohen’s kappa over a \(10\%\) overlap of responses. We found substantial agreement between raters for both AI-generated, Kappa = 0.79(p <.001) and AI assistance, Kappa = 0.69(p <.001).

4.4 Quantitative results

Participants were less satisfied when the assigned creator was AI-generated or AI assisted than when it was purely human, see Figure 3 (left). There was a significant main effect of assigned creator on participant satisfaction \(F(2,3274)=4.98, p =.007, \eta _{p}^{2} = 0.003\). Compared to when the assigned creator was human (M = 3.36, SE = 0.03), participants were less satisfied when the assigned creator was AI-generated (M = 3.27, SE = 0.03), p =.003 or a human with AI assistance (M = 3.30, SE = 0.03), p =.03. The difference in satisfaction between AI-generated and AI assisted assigned creators was not significant.

Participants felt that creators were less qualified when assigned creator mentioned AI use than when it was presented as created by a human, see Figure 3 (middle). There was a significant main effect of assigned creator on perceptions of the creator’s qualifications to make the content \(F(2,3274)=3.064, p =.047, \eta _{p}^{2} = 0.002\). Participants thought that the creator was significantly more qualified when the assigned creator was purely human (M = 3.34, SE = 0.02) than when the assigned creator was a human assisted by AI (M = 3.26, SE = 0.03), p =.002 or when the content was presented as AI-generated (M = 3.23, SE = 0.03), p <.001. The difference between AI-generated and AI assisted assigned creator was not significant for perceptions of how qualified the creator was.

Participants felt marginally less effort went into content when the assigned creator was AI-generated or a human with AI assistance than when it was purely human, see Figure 3 (right). There was a marginal main effect of assigned creator on how much effort participants thought it took to generate the content \(F(2,2406)=2.58, p =.08, \eta _{p}^{2} = 0.002\). Participants thought that significantly less effort was put into the content when the assigned creator was AI-generated (M = 3.12, SE = 0.03) than when the assigned creator was purely human (M = 3.24, SE = 0.03), p <.001) and marginally less than when the assigned creator was a human with AI assistance (M = 3.18, SE = 0.03, p =.09). There were no significant differences in perceived effort when assigned creator was purely human versus a human using AI assistance.

There was a significant main effect of assigned creator on how informative participants judged content to be \(F(2,2400) = 3.17, p =.04, \eta _{p}^{2} = 0.003.\) However, we saw no significant differences between assigned creator in the post-hoc tests. For health content, there was a significant main effect of assigned creator on how valuable participants felt content was \(F(2,750)=3.69, p =.03, \eta _{p}^{2} = 0.01\), but we saw no significant differences between assigned creators in the post-hoc tests. We found no significant interaction effects between assigned creator and context, and no significant effects of assigned creator on participant evaluations of content originality, presentation, trustworthiness, and likelihood to share. We also found no significant effects of assigned creator on any dimensions that were context-specific (news coverage, joke enjoyment, etc.).

4.4.1 Content Comparisons.

In a Chi-square Goodness of Fit test between assigned creator and context, we only found a marginal difference in what participants would share, X2(6, 1533) = 10.9, p =.09. Participants were slightly more likely to share health articles when the assigned creator was purely human than when it was AI-generated, as shown in Figure 4, but the difference was not significant. We also conducted a Chi-square Goodness of Fit test between having an assigned creator and having no assigned creator (from the pre-experiment). We found no significant effect of having an assigned creator on the version participants chose for any context across all choice questions.

Figure 4:

Figure 4: Distribution of assigned creator selected when asked to choose which version to share with others.

4.5 Qualitative Results

To explore the factors influencing participant choices and perceptions, we asked participants to explain their response to the last behavioral choice question. We also asked participants to describe what they thought "By John Doe with AI assistance," and what "AI-generated [content]" meant for how the content was created.

4.5.1 What signals did participants use to make a decision?

We found that almost no participants based their decisions on the assigned source. Instead, a majority of participants said their decisions were based on personal factors, such as their own experiences, interests, and judgments for the travel, health, and jokes contexts, as shown in Table 7. For example, saying they suffer from anxiety, so they favored the Anxiety health article. The exception to this was the News context, where the perceived level of detail and credibility were more frequently used to make decisions. Of note, although a majority of participants in the News context said they chose the version with the most detail, the version they were referring to differed between participants.

Table 7:
ContextPersonal XPContent detailPresentationCredibilityShare-abilityAssigned creator
19.1%38.9%15.1%12.6%N/A4.7%
Travel43.7%28.4%17.8%1.2%N/A2.3%
Health49.1%18.0%15.6%1.9%N/A2.7%
Jokes49.3%30.0%7.4%N/A6.9%1.6%

Table 7: Signals participants reported using to choose between versions, by context.

4.5.2 Meaning of AI assistance.

We identified five main interpretations of how AI was used, differentiated by when they thought humans and AIs handed work off to each other. Numbers in this section in the format of (X/Y) mean that X unique respondents out of the Y total made comments that aligned with that theme. Quotes contain a letter, referring to which context the participant was in (n=news, t=travel, h=health, j=joke) and a participant ID.

Theme 1: Handing over the concept to be written. Most responses in this theme described the human having the idea or providing an outline and the AI writing the content (144/1641).

"I think John Doe told the AI program what he wanted to write about. He also told AI what style he wanted. He probably also indicated the length of writing." –t242

Participants frequently referenced ChatGPT and described workflow variations, such as a human prompting an AI to write the content, a human prompting an AI and then editing the resulting content, a human creating elements and the AI putting them together, or the human providing the prompt and the AI gathering the needed facts as well as writing. Fewer responses described the AI generating the idea and the human doing the writing (15/1641).

Theme 2: Dividing ownership between facts and writing. Some participants thought that AI assistance meant that the human provided or verified the facts and the AI did the writing (37/1641). More frequently, participants thought that the AI looked up, generated, or verified the facts and the human did the writing (74/1641).

"The author asked siri or Alexa or anoth ai platform [sic] about the subject matter. Where the AI pulls information from the web so the author doesn’t have to do as much if any research to write the article." –h289

We also found that when participants described the effects of AI in the context of how factual or accurate content was, AI assistance was believed to be less accurate and factual (73.6% of 57 mentions), but AI-generated content was believed to be more accurate and factual (80.0% of 40 mentions). We found that when describing AI, responses in this theme frequently compared AI to computers, reasoning that computers are good at facts, do not lie, and have access to all the information on the internet, which may have driven these beliefs.

Theme 3: Handing off a draft for editing. Most participants in this theme thought that a human wrote a draft of the content and then used the AI to correct grammar, spelling, formatting, punctuation, and language issues (111/1641).

"I think the content was created by a human with the assistance of artificial intelligence. i also think it means that the content was mostly written by a human and then cross checked for grammar, spelling and punctuation by artificial intelligence." –j387

Fewer participants described the AI creating the draft and the human editing the work (39/1641). Participants in this theme often referenced autocorrect, spellcheck, and autocomplete in their descriptions.

Theme 4: Work is handed off in the middle of writing. A small number of participants described the AI’s role as either finishing the writing started by a human (19/1641) or starting the writing and giving it to the human to finish (2/1641). For example:

"I think one it’s got a name and then with AI that parts of the joke was generated by AI and then somebody else finished it off..." –j238

Theme 5: No human intervention. Although participants were asked about their interpretation of "John Doe with AI assistance," some participants still thought that this meant the content was completely AI-generated (115/1641).

"That AI likely wrote the whole thing and John Doe just published it." –hp6

Variations on this theme included the AI automatically generating a summary from existing online content, a human programming the AI to create the content, and an AI writing it but a human name being put on it to take credit or make it more believable.

4.5.3 Meaning of AI-generated.

The most frequent interpretation of AI-generated was that no human was involved in the creation process (592/1641).

"That no human wrote or contributed to the story. That AI is taking away journalism jobs from humans." –n241

Interpretations of the AI automatically aggregating or summarizing online information came up again, as well as new interpretations such as an AI taking existing online content and modifying it or directly serving content from the internet. Some responses still interpreted AI-generated as involving humans in the loop (154/1641), but these repeated the themes for what AI assistance meant.

Skip 5DISCUSSION Section

5 DISCUSSION

We found partial support for our first hypothesis, that people would view content more positively when the assigned creator was purely human versus when they believed AI was involved. Participants were more satisfied with content when the assigned creator was purely human versus when it was AI-generated. However, contrary to our expectations, assigned creator had no effect on perceptions of content originality, presentation, trustworthiness, or informativeness. Assigned creator also had no effect on context-specific dimensions, such as how enjoyable, trustworthy, approachable, useful, or funny the content was. Responses to our assigned creator check corroborated the lack of weight participants placed on assigned creator, with only a third of participants (33.2%) noting the difference between versions. This lack of reliance on creator as a signal is aligned with prior findings about how users evaluate the credibility of online information [8, 20].

We found only marginal support for our second hypothesis, that negative perceptions toward the use of AI would be stronger in contexts that are considered more high risk and objective. When indicating what they would share, participants chose the version where the assigned creator was purely human only slightly more than when it was AI-generated in the health context and we saw no differences in the other contexts.

We found additional evidence that participants do not heavily factor in the use of AI in our qualitative analyses, where in all contexts < 5% of participants reported using assigned creator as a deciding factor when choosing between versions. Unlike a majority of prior work on the signals that users rely on to evaluate content credibility, external factors like personal experience and personal relevance more heavily influenced participant choices. Similar to findings from Geeng et al.’s work, open-ended responses suggest that the variation in topics within each context triggered personal preferences [25]. Most prior work on content evaluations have limited stimuli to headlines or shorter text, content in the abstract, or contexts participants may have had less personal interest or investment in (e.g., financial recommendations when the participant is not actually investing their own money). We believe our use of longer content, varying topics, and multiple contexts may be more similar to actual user behaviors on the internet, and thus more predictive of the effects that labeling content to indicate the use of AI will have. We think it is therefore likely that telling people that AI was used is unlikely to affect their choice of content or their judgments of credibility, originality, approachability, trustworthiness, and other factors.

Our results instead showed that assigned creator affected participant perceptions of the creator themselves. Participants had more positive views of the creator’s qualifications and effort when the assigned creator was purely human than when they thought AI was used to assist or generate the content. Although effect sizes were small, we believe the consistency of the effect on both metrics related to creator perceptions support the validity of these findings. Furthermore, we found these sentiments reflected in some of the comments about the meaning of AI assistance:

"Created by an incompetent illiterate incapable of writing simple sentences relying on AI instead." –t225

"It means that John Doe isn’t a very competent writer. He needs the help of AI to add flair to his writing." –n325

These comments suggest that even if use of AI may not affect choices between content or content judgments, it may strongly affect a creator’s credibility or reputation among their audience.

5.1 Why does this matter?

We posed the question: if you can’t tell, does it matter? Our findings suggest that in cases where people are evaluating the content, particularly along dimensions that go beyond credibility, it may not. This calls into question the demand for disclosing the use of AI in content creation. A more dystopian interpretation might even suggest that although users say they care about knowing if AI is used, it may not affect their behaviors, similar to the privacy paradox—where people say privacy is a primary concern, but continue to reveal personal information for relatively small rewards [33]. In this interpretation, as long as AI-generated content is indistinguishable from human created work, users may complain about the use of AI, but will not change their consumption behaviors even if the internet is flooded with AI-generated content. Instead of focusing on revealing the use of AI, we suggest that research should explore how AI use changes content and the best ways to signal those changes to users.

Our findings also highlight how AI use may affect perceptions of creators. These effects may become increasingly important as social media platforms continue to proliferate and use of the internet as a tool to facilitate social connections and monetization through a community of followers continues to grow. In these situations, the relationship between influencers and viewers is a critical part of the creator economy. Although creators have volunteered information about their use of AI in the past [9], our findings suggest that as AI use becomes less novel, creators may become reluctant to disclose their use of AI assistance if it will negatively impact viewer perceptions. However platform requirements [44] and technological interventions like watermarking may automatically show when AI was used. This unsolicited divulgence of AI use could result in rejection or abandonment by creators, driving them to publish in "walled gardens" and creating a negative feedback loop or drought of human-created content on the open web.

We found five themes for what participants thought AI use meant for the content creation process, revealing that users don’t have a good understanding how AI use might affect the content they’re consuming. User reasoning was heavily tied to when they think work is being handed off between the human and the AI and is grounded in their mental models of existing technologies. When these associations were incorrect, we saw evidence of participants drawing faulty conclusions about the content. For example, when participants mapped AI to computers, they assumed that content would be more factual or accurate. However these mental models may shift as users become more familiar with AI tools and use becomes more commonplace.

Overall, we believe that we stand at an inflection point. We have an opportunity for the research community, companies, governments, or the media to influence how people think of these tools. By helping users build a more nuanced understanding of what AI tools are capable of and mapping them to existing technologies, the research community could have an outsized impact on their long term success or failure. Expanding on and replicating our results to better understand what effective representations and signals might be for users could help to guide the conversation with regulators, reducing the likelihood of misunderstandings and confusion [45]. For example, explaining when and where handoffs between humans and AIs are occurring, developing measures to explain the magnitude or impact of AI edits, and framing functionality in terms of existing technologies [8], may reduce negative perceptions of the creator while still nudging users to think more critically about the credibility owhat they’re looking at.

Skip 6LIMITATIONS AND FUTURE WORK Section

6 LIMITATIONS AND FUTURE WORK

Our work only explored the effects of assigned creator and context for long-form text content in four contexts. Future work could examine how generalizable these effects are when extended to other forms of media—such as images, video, or audio, or other contexts—such as finance, shopping, or education.

We explored the effects of assigned creator using single item measurements due to length and implementation limitations on our survey platform and although we used the Dunn-Šídák correction in our post hoc tests, we still performed multiple comparisons across responses to each question. We would therefore advise that significant differences, particularly marginal effects, be considered cautiously and that future work could develop more standardized and robust multi-item measurements. We also acknowledge that not all of the forced choice questions were realistic. For example, asking which destination participants would choose to visit may make sense, but there was no reason they couldn’t choose to share all three jokes. Future work might create or contrast between more realistic contextual choices, such as shopping, booking a tour, or making a recipe.

Finally, although we explored how perceived AI use affected content judgments, our findings suggest that testing better ways to tell people that AI was used may be the wrong direction to go. Many of the modifications or changes that AI technologies can make to content are not new and could already be done with existing software. For example, deep fakes have existed long before the recent introduction of generative AI models. Instead, what AI promises to do is make these modifications easier, faster, or cheaper. Rather than focus on AI, future work might instead look for better ways to indicate how content has been edited, such as how much of the content has been changed or what edits have been made. Research might also examine what signals matter most to users in making content judgments, particularly in contexts that go beyond evaluations of credibility.

Skip 7CONCLUSION Section

7 CONCLUSION

We found that when assigned creator involved the use of AI, it had a negative effect on perceptions of the creator and satisfaction, but unexpectedly had no effect on content judgments, such as credibility, originality, presentation, or trustworthiness. We only found a marginal effect of assigned creator when participants chose between versions, with participants choosing health articles where the assigned creator was purely human slightly versus AI-generated. We also identified five interpretations of the effects AI use had on the content creation process, divided by when participants believed handoffs between the human and AI occurred. These interpretations were mapped from their mental models of existing technologies, resulting in some unexpected assumptions, such as AI-generated content being more factual and accurate.

Revealing when AI has been used in the content creation process may be important, particularly when users are no longer able to tell the difference between human, AI assisted, and AI-generated work. However, our findings suggest that explicitly labeling content this way may not have the intended effect of helping users to judge content credibility and instead may negatively reflect on the creator or give users more (incorrect) confidence in the accuracy of the information. We are still in the early days of using AI to populate the internet with content and have a unique opportunity to guide users in understanding what this means for the information they’re seeing. We hope that our contributions move us a step closer to figuring out how and what to communicate to users so that they are able to effectively evaluate content and avoid the consequences of misinformation.

Skip Supplemental Material Section

Supplemental Material

Video Presentation

Video Presentation

mp4

66.5 MB

References

  1. [n. d.]. AI Label. https://ai-label.org/. Accessed: 2023-08-28.Google ScholarGoogle Scholar
  2. [n. d.]. ChatGPT. https://chat.openai.com/. Accessed: 2023-08-31.Google ScholarGoogle Scholar
  3. [n. d.]. DallE. https://openai.com/dall-e-2. Accessed: 2023-08-31.Google ScholarGoogle Scholar
  4. [n. d.]. Google Bard. https://bard.google.com/. Accessed: 2023-08-31.Google ScholarGoogle Scholar
  5. [n. d.]. Midjourney. https://www.midjourney.com/home?callbackUrl=%2Fexplore. Accessed: 2023-08-31.Google ScholarGoogle Scholar
  6. [n. d.]. World Internet User. https://www.kaggle.com/datasets/elmoallistair/world-internet-user. Accessed: 2023-11-29.Google ScholarGoogle Scholar
  7. David Buckingham. 2007. Youth, identity, and digital media. the MIT Press.Google ScholarGoogle Scholar
  8. Jason W Burton, Mari-Klara Stein, and Tina Blegind Jensen. 2020. A systematic review of algorithm aversion in augmented decision making. Journal of behavioral decision making 33, 2 (2020), 220–239.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eileen Cartter. [n. d.]. The Pope Francis Puffer Photo Was Real in Our Hearts. GQ ([n. d.]). https://www.gq.com/story/pope-puffer-jacket-midjourney-ai-memeGoogle ScholarGoogle Scholar
  10. Noah Castelo, Maarten W Bos, and Donald R Lehmann. 2019. Task-dependent algorithm aversion. Journal of Marketing Research 56, 5 (2019), 809–825.Google ScholarGoogle ScholarCross RefCross Ref
  11. Christer Clerwall. 2017. Enter the robot journalist: Users’ perceptions of automated content. In The Future of Journalism: In an Age of Digital Media and Economic Uncertainty. Routledge, 165–177.Google ScholarGoogle Scholar
  12. Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114.Google ScholarGoogle Scholar
  13. Yogesh K Dwivedi, Nir Kshetri, Laurie Hughes, Emma Louise Slade, Anand Jeyaraj, Arpan Kumar Kar, Abdullah M Baabdullah, Alex Koohang, Vishnupriya Raghavan, Manju Ahuja, 2023. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management 71 (2023), 102642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jonathan St BT Evans. 2008. Dual-processing accounts of reasoning, judgment, and social cognition. Annu. Rev. Psychol. 59 (2008), 255–278.Google ScholarGoogle ScholarCross RefCross Ref
  15. Gunther Eysenbach and Christian Köhler. 2002. How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews. Bmj 324, 7337 (2002), 573–577.Google ScholarGoogle Scholar
  16. Andrew J Flanagin and Miriam J Metzger. 2000. Perceptions of Internet information credibility. Journalism & mass communication quarterly 77, 3 (2000), 515–540.Google ScholarGoogle ScholarCross RefCross Ref
  17. Andrew J Flanagin and Miriam J Metzger. 2007. The role of site features, user attributes, and information verification behaviors on the perceived credibility of web-based information. New media & society 9, 2 (2007), 319–342.Google ScholarGoogle Scholar
  18. Andrew J Flanagin and Miriam J Metzger. 2008. Digital media and youth: Unparalleled opportunity and unprecedented responsibility. MacArthur Foundation Digital Media and Learning Initiative Cambridge, MA, USA.Google ScholarGoogle Scholar
  19. Brian J Fogg, Jonathan Marshall, Othman Laraki, Alex Osipovich, Chris Varma, Nicholas Fang, Jyoti Paul, Akshay Rangnekar, John Shon, Preeti Swani, 2001. What makes web sites credible? A report on a large quantitative study. In Proceedings of the SIGCHI conference on Human factors in computing systems. 61–68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Brian J Fogg, Cathy Soohoo, David R Danielson, Leslie Marable, Julianne Stanford, and Ellen R Tauber. 2003. How do users evaluate the credibility of Web sites? A study with over 2,500 participants. In Proceedings of the 2003 conference on Designing for user experiences. 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. John W Fritch. 2003. Heuristics, tools, and systems for evaluating Internet information: helping users assess a tangled Web. Online Information Review 27, 5 (2003), 321–327.Google ScholarGoogle ScholarCross RefCross Ref
  22. Christoph Fuchs, Martin Schreier, and Stijn MJ Van Osselaer. 2015. The handmade effect: What’s love got to do with it?Journal of marketing 79, 2 (2015), 98–110.Google ScholarGoogle Scholar
  23. Catherine A Gao, Frederick M Howard, Nikolay S Markov, Emma C Dyer, Siddhi Ramesh, Yuan Luo, and Alexander T Pearson. 2023. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digital Medicine 6, 1 (2023), 75.Google ScholarGoogle ScholarCross RefCross Ref
  24. Simson Garfinkel, Jeanna Matthews, Stuart S Shapiro, and Jonathan M Smith. 2017. Toward algorithmic transparency and accountability., 5–5 pages.Google ScholarGoogle Scholar
  25. Christine Geeng, Savanna Yee, and Franziska Roesner. 2020. Fake news on Facebook and Twitter: Investigating how people (don’t) investigate. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael H Goldhaber. 1997. The attention economy and the net. First Monday (1997).Google ScholarGoogle Scholar
  27. Manuel Goyanes. 2014. An empirical study of factors that influence the willingness to pay for online news. Journalism practice 8, 6 (2014), 742–757.Google ScholarGoogle ScholarCross RefCross Ref
  28. Andreas Graefe, Mario Haim, Bastian Haarmann, and Hans-Bernd Brosius. 2018. Readers’ perception of computer-generated news: Credibility, expertise, and readability. Journalism 19, 5 (2018), 595–610.Google ScholarGoogle ScholarCross RefCross Ref
  29. Armin Granulo, Christoph Fuchs, and Stefano Puntoni. 2021. Preference for human (vs. robotic) labor is stronger in symbolic consumption contexts. Journal of Consumer Psychology 31, 1 (2021), 72–80.Google ScholarGoogle ScholarCross RefCross Ref
  30. Joo-Wha Hong. 2018. Bias in perception of art produced by artificial intelligence. In Human-Computer Interaction. Interaction in Context: 20th International Conference, HCI International 2018, Las Vegas, NV, USA, July 15–20, 2018, Proceedings, Part II 20. Springer, 290–303.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Daniel Kahneman. 2011. Thinking, fast and slow. macmillan.Google ScholarGoogle Scholar
  32. Nils Köbis and Luca D Mossink. 2021. Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in human behavior 114 (2021), 106553.Google ScholarGoogle Scholar
  33. Spyros Kokolakis. 2017. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon. Computers & security 64 (2017), 122–134.Google ScholarGoogle Scholar
  34. Sarah Kreps, R Miles McCain, and Miles Brundage. 2022. All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation. Journal of experimental political science 9, 1 (2022), 104–117.Google ScholarGoogle ScholarCross RefCross Ref
  35. Michael Levenson. [n. d.]. Science Fiction Magazines Battle a Flood of Chatbot-Generated Stories. New York Times ([n. d.]). https://www.nytimes.com/2023/02/23/technology/clarkesworld-submissions-ai-sci-fi.htmlGoogle ScholarGoogle Scholar
  36. Jennifer M Logg, Julia A Minson, and Don A Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103.Google ScholarGoogle ScholarCross RefCross Ref
  37. Chiara Longoni, Andrea Bonezzi, and Carey K Morewedge. 2019. Resistance to medical artificial intelligence. Journal of Consumer Research 46, 4 (2019), 629–650.Google ScholarGoogle ScholarCross RefCross Ref
  38. Chiara Longoni, Andrey Fradkin, Luca Cian, and Gordon Pennycook. 2022. News from generative artificial intelligence is believed less. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 97–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Marc Meola. 2004. Chucking the checklist: A contextual approach to teaching undergraduates web-site evaluation. portal: Libraries and the Academy 4, 3 (2004), 331–344.Google ScholarGoogle ScholarCross RefCross Ref
  40. Miriam J Metzger. 2007. Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research. Journal of the American society for information science and technology 58, 13 (2007), 2078–2091.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Miriam J Metzger, Andrew J Flanagin, and Ryan B Medders. 2010. Social and heuristic approaches to credibility evaluation online. Journal of communication 60, 3 (2010), 413–439.Google ScholarGoogle ScholarCross RefCross Ref
  42. Taylor Nelson, Nicole Kagan, Claire Critchlow, Alan Hillard, and Albert Hsu. 2020. The danger of misinformation in the COVID-19 crisis. Missouri Medicine 117, 6 (2020), 510.Google ScholarGoogle Scholar
  43. John Newhagen and Clifford Nass. 1989. Differential criteria for evaluating credibility of newspapers and TV news. Journalism quarterly 66, 2 (1989), 277–284.Google ScholarGoogle ScholarCross RefCross Ref
  44. Jennifer Flannery O’Connor and Emily Moxley. [n. d.]. Our approach to responsible AI innovation. YouTube ([n. d.]). https://blog.youtube/inside-youtube/our-approach-to-responsible-ai-innovation/Google ScholarGoogle Scholar
  45. Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations as mechanisms for supporting algorithmic transparency. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Martin Ragot, Nicolas Martin, and Salomé Cojean. 2020. Ai-generated vs. human artworks. a perception bias towards artificial intelligence?. In Extended abstracts of the 2020 CHI conference on human factors in computing systems. 1–10.Google ScholarGoogle Scholar
  47. Reuters. [n. d.]. US lawmaker urges labelling, restrictions on AI content. Reuters ([n. d.]). https://www.reuters.com/technology/us-lawmaker-urges-labelling-restrictions-ai-content-2023-06-29/Google ScholarGoogle Scholar
  48. Goldman Sachs. [n. d.]. The creator economy could approach half-a-trillion dollars by 2027. Goldman Sachs ([n. d.]). https://www.goldmansachs.com/intelligence/pages/the-creator-economy-could-approach-half-a-trillion-dollars-by-2027.htmlGoogle ScholarGoogle Scholar
  49. McKenzie Sadeghi and Lorenzo Arvanitis. [n. d.]. EU wants Google, Facebook to start labeling AI-generated content. Politico ([n. d.]). https://www.politico.eu/article/chatgpt-dalle-google-facebook-microsoft-eu-wants-to-start-labeling-ai-generated-content/Google ScholarGoogle Scholar
  50. Adam Satariano and Paul Mozur. [n. d.]. The People Onscreen Are Fake. The Disinformation Is Real.New York Times ([n. d.]). https://www.nytimes.com/2023/02/07/technology/artificial-intelligence-training-deepfake.htmlGoogle ScholarGoogle Scholar
  51. S Shyam Sundar and Clifford Nass. 2001. Conceptualizing sources in online news. Journal of communication 51, 1 (2001), 52–72.Google ScholarGoogle ScholarCross RefCross Ref
  52. Stuart A. Thompson. [n. d.]. Making Deepfakes Gets Cheaper and Easier Thanks to A.I.NY Times ([n. d.]). https://www.nytimes.com/2023/03/12/technology/deepfakes-cheapfakes-videos-ai.htmlGoogle ScholarGoogle Scholar
  53. Tiffany Tsu and Steven Lee Myers. [n. d.]. A.I.’s Use in Elections Sets Off a Scramble for Guardrails. NY Times ([n. d.]). https://www.nytimes.com/2023/06/25/technology/ai-elections-disinformation-guardrails.htmlGoogle ScholarGoogle Scholar
  54. Hille AJ van der Kaa and Emiel J Krahmer. 2014. Journalist versus news consumer: The perceived credibility of machine written news. In Computation+ Journalism Symposium 2014.Google ScholarGoogle Scholar
  55. Mason Walker and Katerina Eva Matsa. [n. d.]. News Consumption Across Social Media in 2021. Pew Research Center ([n. d.]). https://www.pewresearch.org/journalism/2021/09/20/news-consumption-across-social-media-in-2021/Google ScholarGoogle Scholar
  56. Karen Weise and Cade Metz. [n. d.]. When A.I. Chatbots Hallucinate. NY Times ([n. d.]). https://www.nytimes.com/2023/05/01/business/ai-chatbots-hallucination.htmlGoogle ScholarGoogle Scholar
  57. Axel Westerwick. 2013. Effects of sponsorship, web site design, and Google ranking on the credibility of online information. Journal of Computer-Mediated Communication 18, 2 (2013), 194–211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Michael Yeomans, Anuj Shah, Sendhil Mullainathan, and Jon Kleinberg. 2019. Making sense of recommendations. Journal of Behavioral Decision Making 32, 4 (2019), 403–414.Google ScholarGoogle ScholarCross RefCross Ref
  59. Lixuan Zhang, Iryna Pentina, and Yuhong Fan. 2021. Who do you choose? Comparing perceptions of human vs robo-advisor in the context of financial services. Journal of Services Marketing 35, 5 (2021), 634–646.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The Effects of Perceived AI Use On Content Perceptions

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems
          May 2024
          18961 pages
          ISBN:9798400703300
          DOI:10.1145/3613904

          Copyright © 2024 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 May 2024

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate6,199of26,314submissions,24%
        • Article Metrics

          • Downloads (Last 12 months)552
          • Downloads (Last 6 weeks)552

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format