Analysis of biased language in peer-reviewed scientific literature on genetically modified crops

Social, political, and economic forces may inadvertently influence the stance of scientific literature. Scientists strive for neutral language, but this may be challenging for controversial topics like genetically modified (GM) crops. We classified peer-reviewed journal articles and found that 40% had a positive or negative stance towards GM crops. Proportion of positive and negative stance varied with publication date, authors’ country of origin, funding source, and type of genetic modification. Articles with a negative stance were more common at the beginning of the millennium. Authors from China had the highest positive:negative ratio (8:1), followed by authors from the USA (12:5) and the EU (5:7). Positive stance articles were six times more likely to be funded by private sources compared to those with a neutral or negative stance. Articles about glyphosate were more likely to be negative compared to articles about Bacillus thuringiensis. Linguistic features of articles with positive and negative stances were used to train a random forest classifier that predicts stance significantly better than random chance. This suggests the possibility of an automated tool to screen manuscripts for unintended biased language prior to publication.


Introduction
Public opinion on the safety of genetically modified (GM) food indicates a disconnect with the current state of scientific consensus. A 2015 poll by the Pew Research Center revealed that 37% of US adults believe GM foods are safe to consume while 88% of scientists within the American Association for the Advancement of Science believe them to be safe [1]. This disparity between the opinions of the general public and scientists is a concern for consumers, regulatory agencies, and producers of GM products. The internet is the main pipeline of information connecting published scientific research and public awareness through journalism [2]. Stance in favor of or against the use of GM crops can be subtly present in the language of peer-reviewed journal articles, conveyed to media outlets and amplified into public opinion [3]. Articles with a negative stance on GM foods are more likely to become the source of rumors [4], while, a positive stance in scientific literature may shift public perception about GM foods toward a greater belief in their safety [5,6]. On the other hand, confirmation bias leads to overconfidence in a particular opinion [7]. The purpose of this study was to quantify stance in published literature, and to identify linguistic patterns in positive and negative literature that could propagate from scientists into popular culture and may serve as potential sources of polarized public opinion about GM crops.
A wide variety of GM crops have been developed over the past 38 years [8]. The two most common genetic modifications are resistance to the broad-spectrum herbicide glyphosate (aka Round-Up TM ), and incorporation of Cry protein genes from Bacillus thuringiensis (Bt) to make crops toxic to insect pests [3,9]. Glyphosate resistant crops were first approved in 1996 and now account for more than 90% of cotton, corn, and soybean crops in the US [3]. Bt was isolated in 1901 and the first commercially successful Bt crops were introduced in 1996 [10]. Many GM crops contain both genetic modifications so that farmers can efficiently eliminate weeds with glyphosate and avoid the use of insecticides because the crop simultaneously withstands the herbicide and is no longer threatened by common insect pests. Other genetic modifications have been developed that help crops resist diseases [11] and increase their nutritional value such as the insertion of the genes required to synthesize vitamin A in golden rice [12].
There are many reasons for negative and positive attitudes about GM crops. A major concern held by the general public regards the safety and nutritional quality of GM foods [13], although recent reviews of scientific studies provide evidence that GM foods are substantially equivalent to non-GM foods [13][14][15][16]. The use of glyphosate-resistant crops has increased yields and profits for farmers [17][18][19], but there are potential negative impacts of the drastic increase in the use of glyphosate worldwide [3] such as the evolution of glyphosate resistant weeds [20]. Also, glyphosate is slow to degrade and can accumulate as residue on plants, and more research is needed to determine if chronic exposure results in deleterious effects [21]. Recent research linking cancer to glyphosate exposure is being scrutinized [22,23]. Another serious concern is that GM crops can hybridize with wild relatives [24][25][26]. There is also the potential for Bt crops to adversely impact populations of non-target organisms [27], although evidence suggests this is not the case [28][29][30]. To the contrary, the use of Bt crops has substantially reduced the use of much more toxic insecticides, and this has helped protect populations of non-target insects [3,31].
Despite reasons for positive and negative attitudes about GM foods, scientists strive for objectivity and are expected to convey a neutral stance in their writings. The intention of this study is not to criticize the use of any particular position or stance when conveying information regarding biotechnology, but rather to raise awareness of the subtle means in which authors can convey a position. A positive stance indicates that an author is optimistic about the potential for GM foods; a negative stance could indicate a concern for the environment. While both perspectives are equally important in the overall discussion of biotechnology, scientific literature is typically idealized as lacking opinion. Yet, many scientific articles convey a position or particular stance through subtle linguistic features that go beyond just the words used and include grammatical features.
Characteristics of language such as stance can be reflected in the writing style of authors. For example, in financial writing, reports that present lower earnings are harder to read (e.g. more syllables, more words) [32]. Compared to genuine scientific writing, fraudulent papers are characterized by fewer adjectives and more citations [33,34]. Writing by students and poets with depression has distinct linguistic characteristics [35,36]. We are not suggesting that authors of papers with a negative or positive stance toward GM crops are depressed, obfuscating results, or deceptive, but rather that stance may be conveyed with unique, subtle linguistic characteristics.
The purpose of this study was to quantify and analyze stance across peer-reviewed scientific literature published between 2000 and 2018 and test the hypotheses that (a) there exist subsets of peer-reviewed journal articles with detectable positive, neutral, or negative stance toward GM crops, and their proportions will vary with publication date, authors' country of origin, funding source and type of genetic modification; (b) characteristic linguistic features of papers with a positive or negative stance can be used to predict the stance of article; and (c) that articles with similar stances tended to cite similar articles.

Search strategy
The aim of our literature search was to gather published, peer-reviewed journal articles discussing GM crops within the context of the environment, as opposed to purely laboratory studies. Our goal was to quantify stance and analyze the linguistic characteristics of these articles, not to assess the actual results of individual papers. To quantify stance in published literature on GM crops, we first narrowed our focus to articles related to GM crops that were commercially available in 2014 and mentioned environmental impacts as a key term [37]. Our search was restricted to English language articles that were published after 2000. We conducted our first search in July 2014 and updated our database in August 2018 to include newer articles.
Search strings used to gather published articles were composed of three essential parts connected by an AND operator. Each of the three parts (crops, environmental impact, and genetic modification) included multiple key terms, some with wildcards to capture multiple variations of words, connected with an OR operator. Corn, rapeseed, cottonwoods, rice, wheat, soybean, tobacco, cotton, subspecies of Brassica rapa (e.g. turnips and bok choy), beets, beans, creeping bentgrass, field pumpkins, potatoes, alfalfa, sugarcane, peppers, tomatoes, Petunias, papayas, eggplants, chicory, plums, muskmelon, roses, flax, and carnations were included in our search. We searched eight databases: Part 2: ('environmental assessment' OR 'environmental control' OR 'environmental degradation' OR 'environmental effects' OR 'environmental impact' OR 'environmental management' OR 'environmental protection' OR biosafety OR 'risk assessment' OR ecology OR ecological OR ecosystem OR biodiversity OR 'biological diversity' OR 'species diversity' OR 'species richness' OR 'biological input' OR 'biological output' OR landscape OR regional OR EIQ OR 'environmental impact quotient' OR 'ecosystem service' OR 'gene flow') Restricts our search to articles that mention terms related to environmental impacts. Part 3: (GM OR GMO * OR 'genetic engineering' OR 'genetic transformation' OR 'genetically engineered' OR 'genetic erosion' OR 'genetic contamination' OR 'genetic manipulation' OR 'genetically modified' OR 'genetic modification' OR transgenics OR transgenic OR transgenes OR introgression) Focuses our search on articles using genetic modification terminology.
Metadata was collected into a RefWorks account and downloaded into .csv files. All available PDF files were downloaded. For many unavailable files, a request for documents was made through interlibrary loan in an attempt to capture as many journal articles as possible. Files without selectable text were removed. Books and non-English PDFs were also removed from the database. Using the title of each article, duplicates and articles not matching metadata were removed.

Classification of stance
To classify journal articles as positive, neutral or negative stance, each article was randomly assigned a number between 1 and 1873. Due to quality filtering, 114 of these articles were removed from subsequent classification and analyses because they did not match an article in our metadata or because they were a duplicate, resulting in a total of 1759 articles. Articles 1-500 were classified by at least four individuals. Articles were classified into one of five categories: 'Positive' , 'Negative' , 'Neutral' , 'Does not discuss GM crops' , and 'No consensus' . Individual examiners were trained to look for key terms suggesting stance in the context of genetic modification terminology (e.g. 'risks' or 'benefits' of GM foods or technology); full instructions for the examiners are in table S2. To ensure accountability, and to collect information on relevant sentences, when 'Positive' or 'Negative' was chosen, the examiners recorded the sentences that indicated the stance in their classification. In an effort to reduce the time required to classify articles, individuals were instructed to focus on the abstract, introduction, discussion, and conclusion, specifically on paragraphs that discussed GM crops. An article was classified as 'Does not discuss GM crops' if search terms did not match text in the main sections of the article (e.g. terms were exclusive to references). To verify that these disqualified articles did not discuss GM crops, a search was conducted within each article using three search terms ('transg' , 'engin' , 'modif '). Classification results were recorded in a Google spreadsheet using Google form submissions.

Data analysis
The stance of articles numbered 21 through 500 (n = 451) was classified by four different individuals. We originally intended to analyze 500 articles, however some PDFs were removed because of poor quality and a subset was used for training purposes. Of the 451 journal articles analyzed, at least three of the four independent evaluators agreed on the stance of 230 articles. These 230 consensus articles were used in the final analyses.
The number of journal articles with a consensus stance of neutral, positive, or negative was standardized per year. A cumulative sum was calculated for each of the three stance categories, divided by the total articles for each year, and scaled with a minimum of zero and maximum of one, so that categories could be compared with one another. Metadata from RefWorks was used to determine the country of authorship for the first author and the primary source of funding, which we assumed was the first funding source listed. Funding was found searching for 'funding' or 'supported' or 'acknowledgements' or visually searching the end of the paper. Funding sources with 'national' , 'federal' , or 'ministry' were considered public. Non-profit funding was rare, so it was grouped with public funding (government agencies). Unknown funding sources and articles without funding sources were not included in the analysis of funding sources, which occurred for 50 articles, leaving 180 articles for the funding analysis.
To examine the frequency that different topics were mentioned within the 230 articles with a consensus positive, negative, or neutral stance, we analyzed the occurrence of lowercase strings (sequences of characters) such as ' bt ' (with spaces between quotes), 'glyphosate' , 'cancer' and Latin names from Part 1 of our search. Lowercase text was used for all articles. One-way analysis of variance (ANOVA; scipy.stats version 0.18.1) [38] was used to determine differences in the abundance of text strings in articles with different stances.

Supervised classification
The same 230 articles that were used in the analyses described above were used in the supervised classification. All files were converted from PDF to plain text using AntFile Converter [39]. These files were then processed with the Biber tagger [40] to linguistically annotate each file, assigning each word in each file a part of speech (POS) label (table S4).
The accuracy of the Biber POS tagger has been established in several studies. Most recently, Gray, 2019 [41] evaluated the precision and recall of most of the linguistic features used in the present study. The majority of these features had an accuracy of between 95% and 100%. It is important to note that these rates include features such as relative clause structures, as in the text excerpt below, and different types of complement clauses (e.g. noun complements-'The idea that GM foods'; verb complements-'It is thought that GM foods'; Adjective complement clauses-'It is important that scientists'), and not just finite features such as definitive articles (a, an, the) or modals (will, can, should, might, etc).
These linguistically annotated files were then processed using a program that provides counts for the linguistic features in each text. This program also includes dictionaries to further annotate certain features. For example, verbs are labeled with semantic categories-e.g. as mental verbs (think, believe) or communication verbs (say, state)-and adjectives can be categorized as relating to size, time, color, evaluation. The counts for these features were normalized to allow for accurate comparisons across texts of unequal length. Each of the feature counts were normed to a rate of per 1000 words.
We used a random forest model to predict the stance of an article. Supervised classification of journal articles was performed using the python (version 3.6) package sklearn (version 0.18) [42]. One-way ANOVA was used to determine differences in the abundance of words beginning with 'transg' and to determine the best features for a random forest model. ANOVA results of each tag count of positive, negative, and neutral categories were ranked in order of increasing p value. The top 12 features were chosen to predict the stance of an article from among the positive, negative, and neutral articles ( figure 4, table S5). Articles were split, using the train_test_split function, into training (70%) and test (30%) to be used in a grid search to determine random forest parameters. The GridSearchCV function was used to find the ideal parameters among the number of estimators (200 or 500), number of features ('auto' , 'sqrt' , 'log2'), maximum depth (2-10), and split criterion ('gini' or 'entropy'). Using the best grid search parameters and a fixed random state (Ran-domForestClassifier function; n_estimators = 200, max_features = 'auto' , max_depth = 6, criterion = 'gini' , random_state = 42), we created a random forest for 50 random 70/30 training/test data splits (train_test_split function; random_state = 0-49) to minimize the effect of any particular data split. Random chance for each training/test split was calculated using this function: (Positive/Total) 2 + (Negative/Total) 2 + (Neutral/Total) 2 where Positive, Negative, and Neutral refers to the number of respective labels in the testing data, and Total refers to the total number of testing labels. Percentages for random chance and accuracy (determined by the accuracy_score function) were averaged for all 50 models.

Analysis of references
We downloaded the number of times an article was cited and the citations for each article from Web of Science (https://apps.webofknowledge.com) on 7 November 2019 to determine the citation rates of positive and negative articles, and to test the hypothesis that articles tend to cite other articles with similar stance. Unfortunately, not all articles were in the WoS database and six of the eight databases did not provide citations. Consequently, only 186 articles could be included in the analysis of citation communities (i.e. collection of references within an article). Citation rate was calculated as the number of citations between the year of publication and 7 November 2019. These values were averaged for positive, negative, and neutral articles and ANOVA with Tukey's Honestly Significant Difference were performed (MultiComparison and tukeyhsd functions from statsmodels, version 0.9.0 [43]) to determine if there were differences for citation rates among the stances. We performed an analysis to determine if any cited articles were significantly associated with a positive or negative stance. Separately, unique citations for positive and negative articles (n = 64; positive = 44, negative = 20) were used as column headers in a dataframe with article numbers representing rows. Indicator species analyses of citations were used to determine which references were associated with positive or negative stances using the 'multipatt' function in the 'indicspecies' package (version 1.7.6) in R (version 3.3.0) with default values [44]. We performed a post hoc classification of indicator citations, however there were no indicator citations with a positive or negative stance consensus for these six articles.
Presence of specific citations within an article was represented with a 1, absence was denoted by a 0. The Jaccard dissimilarity index was used to compare the differences of the communities of citations. Values range from 0 to 1; a 0 indicates communities that are identical, a 1 indicates communities that are completely dissimilar. We performed alpha (i.e. number of citations per article) and beta diversity (e.g. change in citations between articles) calculations using the diversity functions within skbio (version 0.4.2). An ANOVA was used to compare alpha diversity among categories of stance while a permutational multivariate analysis of variance (PERMAN-OVA) was used to determine beta diversity differences. For beta diversity and PERMANOVA, citations that were only included in one article (n = 3805) were removed, leaving 19 negative articles and 42 positive articles. Homogenous dispersion of variances was determined using 'betadisper' from the 'vegan' package (version 2.5-4) in R (version 3.3.0). The dispersion of variances were non-homogenous (F 1,59 = 5.48, p = 0.03). Because sample sizes were also different, we randomly sampled 19 positive articles 50 times and conducted a PERMANOVA, then averaged the F statistics and p values. Beta diversity was plotted using seaborn (version 0.9.0) and the pcoa function from skbio.

Patterns in stance
Articles within our database discussed a total of 23 GM crops, the most common were: Zea mays (15%), Brassica napus (14%), Populus (14%), and Oryza sativa (13%) (table S3). From the 451 articles that were examined by four individuals, 65 (14.38%) did not discuss GM crops and were removed from further analyses. Consensus, where three or more individuals agreed on stance classification, was reached for 230 (50.88%) of the examined articles and of these, 139 (60.4%) were neutral, 60 (26.1%) were positive, and 31 (13.5%) were negative, indicating that 40% of the consensus articles contained a stance on GM crops ( figure 1(a)). Articles with a negative stance were more common in the early 2000s and those with a neutral and positive stance had relatively steady rates of publication ( figure 1(b)).
Data was summarized for countries and regions with the largest sample sizes, USA, China, and the EU, and all other countries were grouped together as 'other' . Authors from all countries and regions had a similar proportion of papers with a neutral stance ( figure 1(c)). Authors from China had the highest positive:negative stance proportion (8:1), followed by authors from the USA (12:5) and other countries (35:18). On the other hand, papers by authors from the EU were more negative than positive (5:7; figure 1(c)).
The relative abundance of neutral, positive and negative articles also differed among funding sources. The funding sources for 50 articles (22%) were unknown, 162 articles (70%) were supported by public funding and 18 (8%) were supported by private funding. Articles supported by public funding were 65% neutral, 20% positive, and 15% negative. In contrast, articles supported by private funding were 44% neutral, 50% positive, and 6% negative. Private funding sources were six times more likely in positive articles than negative articles ( figure 1(d)).
Within our database, the words 'gene flow' (F 2,247 = 0.95, p = 0.39) and 'cancer' (F 2,247 = 1.35, p = 0.26) were mentioned more frequently in articles with a negative stance than those with a neutral or positive stance, though not significantly more (figure 2). Bt crops were mentioned more frequently than glyphosate resistant crops and negative articles mentioned both Bt and glyphosate more often than neutral or positive articles (F 2,247 = 2.00, p = 0.14; figures 3(a) and (b)). The ratio of the number of times that Bt was mentioned relative to glyphosate was five times higher in positive articles than negative and neutral articles ( figure 3(c)).

Linguistic features
Characteristic linguistic features were identified in articles with a positive, negative, and neutral stance (figure 4). For example, some linguistic features that showed significant differences between positive and negative articles were that complement clauses, the definite article the, and modals of possibility (e.g. may, might, could). These constellations of features can be used to take a strong stand either for, or against a particular point of view-in this case they are used to cast doubt or raise possible problems with GM crops. The text excerpt from a negative article shows these features working together to cast doubt or caution against GM crops. That complements have been underlined, and possibility modals and the are bold.
However, some studies have shown that the host genotype might affect the community of microorganisms that establish a symbiotic relationship with the plant, and that any alteration in the microbial community diversity or activity might have significant effects on the plant's ability to grow and adapt [45].
In contrast, the excerpt below from a positive article reflects linguistic characteristics that were found to be significantly different from negative articles. For comparison of these two excerpts of similar length (46 and 47 words) that reflect the typical linguistic Does not include articles with no consensus (articles without a three quarters majority agreement) and articles that did not discuss GM crops in the main text (e.g. articles that only use genetic modification terms in references). (b) Cumulative sum of rate of publication for each stance. Articles with a positive, negative, or neutral stance were normalized to a cumulative sum of the rate of publication (black line represents a consistent rate). Lines with a steep slope represent a period in time with a high rate of publication, a flat slope indicates no publications for that particular stance. (c) The percent of articles with a neutral, positive, or negative stance categorized by the country or region of origin of the first author. Papers by authors from China, USA, and the EU represented the three highest publication records for articles with a consensus. All other countries are grouped into 'Other' . (d) Relative proportion of funding sources for articles with a neutral, positive or negative stance toward GM crops. Funding sources were determined by the identity of the first source of funding within each article. Numbers above or on the bars indicate the number of articles for each category. patterns found in negative and positive articles, the same features have been identified in both excerpts. In the excerpt below the proper nouns have been bolded and underlined. The is in bold.
The main pests of poplar (primarily Lepidoptera) were obviously inhibited in the transgenic poplar; however, the number of Coleoptera pests was generally low, and the inhibitory effect was not obvious. At the same time, there was little influence on the number of natural enemies and neutral insects [46].
Comparing features in these two samples we see that the use of the definite article the is the same in these two examples, five in each. However, the negative article sample has two that complement clauses while the positive article has none. Similarly, the positive article has four proper nouns while the negative article has none. The positive article does not have any possibility modals while the negative excerpt has two.
The linguistic analysis of sets of features helps to demonstrate how several linguistic features work together to craft points of view for or against a particular position, or to create a more neutral position. We identified a list of key linguistic features that can be used for automated classifications of  . Summary of the number of times a GM crop was mentioned for each stance. Counts of sequences of characters (strings) for two common transgenic genes, Bt Cry proteins (a) and glyphosate resistance (b). Bt word counts divided by glyphosate word counts (c). Bt was searched using the ' bt ' string (with spaces), while Roundup was searched using the 'glyphosate' string. Error bars represent a bootstrapped 95% confidence interval. stance within peer-reviewed literature discussing GM crops (table S5 and figure 4). Supervised classification with random forest models using the top 12 linguistic features shown in figure 4 successfully predicted the stance of articles 64.2% of the time, which is significantly better than random chance (44%) for the top linguistic features reported in table S5 and visualized in figure 4. Therefore, using only 12 features, we predicted the stance of articles with a significantly better accuracy than random chance.

Citation patterns
The reference sections of positive and negative articles were analyzed to determine if articles with a similar stance tend to have similar citations. A total of 3987 references were collected from 64 articles; positive articles contained 3205 citations (averaging 72.8 per article), while negative articles contained 999 (averaging 50.0 per article). There was no statistical difference in the number of citations per article ( figure S1; F 1,184 = 1.73, p = 0.19). Overall, 3805 references were cited once, 154 references were cited twice, 20 references were cited three times, four references were cited four times, and three references were cited five times. Averaged PERMANOVA results for randomly sampled positive versus negative references indicate that there is a subtle yet significant difference between the bibliographies (F 1,62 = 1.29, p = 0.017) (figure S2). Six articles were significantly associated with negative stance articles, according to an indicator species analysis, five of which discussed gene flow (table S7).

Discussion
Our results indicate that 40% of articles discussing GM crops and the environment have a positive or negative stance, with twice as many articles with a positive stance (26.1%) than a negative stance (13.5%; figure 1(a)). The proportions of peer-reviewed journal articles with detectable stance toward GM crops vary with year of publication, authors' country of origin, funding source and type of genetic modification. Distinctive linguistic features within articles of a particular stance (figure 4) suggest that positive and negative articles can be detected by parsing the meaning and choice of words within an article, and we used these features to successfully predict the stance of an article.
Our survey reveals variation in scientific stance about GM crops over time. Articles with a negative stance toward GM crops were more likely to be published in the beginning of the millennium, while positive and neutral articles had relatively stable publication rates ( figure 1(b)). Changes in stance over time could represent a shift in scientific consensus on the safety of GM crops. The trend in negative stance articles could be influenced by the negative public attitudes toward GM food in the late 90s that began to dissipate in the early 2000s [47][48][49].
The ratio of positive:negative articles about GM crops varies among countries and regions ( figure 1(c)), and may or may not corroborate public opinion polls. The ratio for authors from the European Union (5:7) reflects the negative public opinion about GM products and political efforts to ban GM crops in many EU countries [50]. In contrast, there is a disparity between public opinion about GM crops and the stance of scientific publications by Chinese authors. Articles by Chinese authors had the highest ratio of positive to negative stance (8:1), yet a recent study indicates that 41.4% of the Chinese public oppose and only 11.9% support GM crops [13,51]. This incongruence could reflect the push for investment in GM crops from China's government, as China was the first country to approve a commercial GM crop [52,53]. Alternatively, English is a second language for most Chinese authors, and this may influence subtle linguistic features which inadvertently makes the stance of their writing more positive. More research is needed to investigate potential linkages between the political environment and the stance of scientists.
The proportion of papers with positive and negative stance varied with funding sources and type of genetic modification. Positive articles were three and six times more likely to be funded by private sources than neutral and negative stance articles, respectively ( figure 1(d)). Our study discovered some common themes of positive and negative stance toward GM crops. Early concerns over the potential for modified genes from GM crops to assimilate into populations of wild relatives could account for our finding that negative articles were four times more likely to mention gene flow ( figure 2(a)). The ongoing debate about the long-term safety of glyphosate exposure [54] may be reflected in the observation that articles with a negative stance were 85 times more likely to mention cancer compared to positive articles ( figure 2(b)). On the other hand, a positive stance toward GM crops in our database could relate to generally more positive attitudes toward Bt crops, as opposed to glyphosate-resistant crops (figure 3). The genetic modification Bt is considered to be a relatively safe substitute for toxic pesticides [9], and positive articles mentioned Bt ten times more often than glyphosate, compared to negative and neutral articles which mention Bt only twice as often.
Machine learning has the potential to automatically screen manuscripts. The linguistic features identified here could be applied broadly to predict the stance of any body of English texts discussing GM crops. Together, the results of this analysis of bias can help us understand features behind the debate of the safety of GM crops, and also develop automated methods to screen manuscripts for unintended stance. Our machine learning algorithm could benefit from improved accuracy, possibly with more data, or neural network classifiers. During the classification, a significant portion of the journal articles did not have consensus partly because we skimmed articles and did not analyze each article in depth. Also, one person may not pick up on the same linguistic features as another person. Some articles contained linguistic features that indicated both positive and negative stances within the same article. The scope of this study was limited to a small subset of journal articles that discussed GM crops within the context of the environment. Our method could be also improved by including articles from other controversial topics, such as climate change. If similar linguistic features are used in other topics, an automated tool could be used to quickly pinpoint linguistic features that may have the unintended consequence of taking a stance rather than objectively reporting facts, if that is considered desirable. Whether stance on scientific topics are appropriate for peer-reviewed journal articles is beyond the scope of this study.

Conclusions
Our analysis of the stance toward GM crops within peer-reviewed scientific literature highlights characteristics of writing and the contextual environment. Despite the limitations of this study, we discovered a significant portion of peer-reviewed journal articles on GM crops contained a positive or negative stance, and that research funded by private corporations was six times more likely to have a positive stance. We also learned that articles discussing the Bt genetic modification corresponded to positive stances. Public opinion polls and the indirect opinions highlighted in our study suggest that political and social factors may play a role in the ratio of positive to negative stance articles from different regions. Though, just how much the political and social environment influences the stance of scientists remains a mystery.

Data availability statement
All metadata for classified articles, the articles themselves, and random forest model code will be made available on the Dryad Digital Repository upon acceptance.
The data that support the findings of this study are available at https://gitlab.com/usda-ars/gmo-stance/.

Acknowledgments
This research was initiated during a graduate seminar on Ecosystem Health and GMOs (BIO 698) when we noticed evidence of bias within peer reviewed scientific literature. Thanks to undergraduate interns, Parker Leber, Cat Watkins, Morgan Brion, Haley Harris, and Kayla Thompson, for their help downloading and classifying articles. Another thanks goes to David Minkler for his sage advice and classification of articles. We also thank Mary DeJong for her help in developing the journal article search strings and identifying articles in library databases. Special thanks to Doug Biber for his help tagging our texts. This research was partially funded by a grant to BMS from Northern Arizona University's School of Earth and Sustainability and support to NCJ through a Bullard Fellowship from Harvard Forest. Finally, BMS would like to thank the Skeptics' Guide to the Universe for planting the seed that grew into this publication.

Conflict of interest
B M S and M L have a positive stance toward G M crops. N C J and A R have a negative stance toward G M crops. A P, K G, and R R have a neutral stance toward G M crops.