Natural Language Processing for Policymaking

Language is the medium for many political activities, from campaigns to news reports. Natural language processing (NLP) uses computational tools to parse text into key information that is needed for policymaking. In this chapter, we introduce common methods of NLP, including text classification, topic modeling, event extraction, and text scaling. We then overview how these methods can be used for policymaking through four major applications including data collection for evidence-based policymaking, interpretation of political decisions, policy communication, and investigation of policy effects. Finally, we highlight some potential limitations and ethical concerns when using NLP for policymaking. This text is from Chapter 7 (pages 141-162) of the Handbook of Computational Social Science for Policy (2023). Open Access on Springer: https://doi.org/10.1007/978-3-031-16624-2


Introduction
Language is an important form of data in politics.Constituents express their stances and needs in text such as social media and survey responses.Politicians conduct campaigns through debates, statements of policy positions, and social media.Government staff needs to compile information from various documents to assist in decision-making.Textual data is also prevalent through the documents and debates in the legislation process, negotiations and treaties to resolve international conflicts, and media such as news reports, social media, party platforms, and manifestos.
Natural language processing (NLP) is the study of computational methods to automatically analyze text and extract meaningful information for subsequent analysis.The importance of NLP for policymaking has been highlighted since the last century (Gigley, 1993).With the Handbook of Computational Social Science for Policy. 2023. Chapter 7, pages 141 -162.Open Access on Springer: https://doi.org/10.1007/978-3-031-16624-2 1

Investigate Policy Effects
Text -News -Press Releases -Social Media -Legislation -Campaigns Fig. 1 Overview of NLP for policymaking.
recent success of NLP and its versatility over tasks such as classification, information extraction, summarization, and translation (Devlin et al., 2019;Brown et al., 2020), there is a rising trend to integrate NLP into the policy decisions and public administrations (Misuraca et al., 2020;Engstrom et al., 2020;Van Roy et al., 2021).Main applications include extracting useful, condensed information from free-form text (Engstrom et al., 2020), and analyzing sentiment and citizen feedback by NLP (Biran et al., 2022) as in many projects funded by EU Horizon projects (European Commission, 2017).Driven by the broad applications of NLP (Jin et al., 2021a;Gonzalez et al., 2022), the research community also starts to connect NLP with various social applications in the fields of computational social science (Lazer et al., 2009;Shah et al., 2015;Engel et al., 2021;Luz, 2022) and political science in particular (Grimmer and Stewart, 2013;Glavaš et al., 2019).
We show an overview of NLP for policymaking in Figure 1.According to this overview, the chapter will consist of three parts.First, we introduce in Section 2 NLP methods that are applicable to political science, including text classification, topic modeling, event extraction, and score prediction.Next, we cover a variety of cases where NLP can be applied to policymaking in Section 3. Specifically, we cover four stages: analyzing data for evidence-based policymaking, improving policy communication with the public, investigating policy effects, and interpreting political phenomena to the public.Finally, we will discuss limitations and ethical considerations when using NLP for policymaking in Section 4.

NLP for Text Analysis
NLP brings powerful computational tools to analyze textual data (Jurafsky and Martin, 2000).According to the type of information that we want to extract from the text, we introduce four Fig. 2 The usage and example applications of text classification on political text.different NLP tools to analyze text data: text classification (by which the extracted information is the category of the text), topic modeling (by which the extracted information is the key topics in the text), event extraction (by which the extracted information is the list of events mentioned in the text), and score prediction (where the extracted information is a score of the text).Table 1 lists each method with the type of information it can extract and some example application scenarios, which we will detail in the following subsections.

Text Classification
As one of the most common types of text analysis methods, text classification reads in a piece of text and predicts its category using an NLP text classification model, as in Figure 2.
There are many off-the-shelf existing tools for text classification (Yin et al., 2019;Brown et al., 2020;Loria, 2018) such as the implementation using the Python package transformers (Wolf et al., 2020).A well-known subtask of text classification is sentiment classification (also known as sentiment analysis, or opinion mining), which aims to distinguish the subjective information in the text, such as positive or negative sentiment (Pang and Lee, 2007).However, the existing tools only do well in categories that are easy to predict.If the categorization is customized and very specific to a study context, then there are two common solutions.One is to use dictionary-based methods, by a list of frequent keywords that correspond to a certain category (Albaugh et al., 2013) or using general linguistic dictionaries such as the Linguistic Inquiry and Word Count (LIWC) dictionary (Pennebaker et al., 2001).The second way is to adopt the data-driven pipeline, which requires human hand coding of documents into a predetermined set of categories, then train an NLP model to learn the text classification task (Sun et al., 2019), and verify the performance of the NLP model on a held-out subset of the data, as introduced in Grimmer and Stewart (2013).An example of adapting the state-of-the-art NLP models on a customized dataset is demonstrated in this guide.
Using the text classification method, we can automate many types of analyses in political science.As listed in the examples in Figure 2, researchers can detect political perspective of news articles (Huguet Cabot et al., 2020), the stance in media on a certain topic (Luo et al., 2020), whether campaigns use positive or negative sentiment (Ansolabehere and Iyengar, 1995), which issue area is the legislation about (Adler and Wilkerson, 2011), topics in parliament speech (Albaugh et al., 2013;Osnabrügge et al., 2021), congressional bills (Hillard et al., 2008;Collingwood and Wilkerson, 2012) and political agenda (Karan et al., 2016), whether the international statement is peaceful or belligerent (Schrodt, 2000), whether a speech contains positive or negative sentiment (Schumacher et al., 2016), and whether a U.S. Circuit Courts case decision is conservative or liberal (Hausladen et al., 2020).Moreover, text classification can also be used to categorize the type of language devices that politicians use, such as what type of framing the text uses (Huguet Cabot et al., 2020), and whether a tweet uses political parody (Maronikolakis et al., 2020).

Topic Modeling
Topic modeling is a method to uncover a list of frequent topics in a corpus of text.For example, news articles that are against vaccination might frequently mention the topic "autism," whereas news articles supporting vaccination will be more likely to mention "immune" and "protective."One of the most widely used models is the Latent Dirichlet Allocation (LDA) (Blei et al., 2001) which is available in the Python packages NLTK and Gensim, as in this guide.
Specifically, LDA is a probabilistic model that models each topic as a mixture of words, and each textual document can be represented as a mixture of topics.As in Figure 3, given a collection of textual documents, LDA topic modeling generates a list of topic clusters, for which the number  of topics can be customized by the analyst.In addition, if needed, LDA can also produce a representation of each document as a weighted list of topics.While often the number of topics is predetermined by the analyst, this number can also be dynamically determined by  measuring the perplexity of the resulting topics.In addition to LDA, other topic modeling algorithms have been used extensively, such as those based on principal component analysis (PCA) (Chung and Pennebaker, 2008).

Event Extraction
Event extraction is the task of extracting a list of events from a given text.It is a subtask of a larger domain of NLP called information extraction (Manning et al., 2008).For example, the sentence "Israel bombs Hamas sites in Gaza" expresses an event "Israel Event extraction is a handy tool to monitor events automatically, such as detecting news events (Walker et al., 2006;Mitamura et al., 2017), and detecting international conflicts (Azar, 1980;Trappl, 2006).To foster research on event extraction, there are tremendous efforts into textual data collection (McClelland, 1976;Schrodt and Hall, 2006;Merritt et al., 1993;Raleigh et al., 2010;Sundberg and Melander, 2013), event coding schemes to accommodate different political events (Goldstein, 1992;Bond et al., 1997;Gerner et al., 2002), and dataset validity assessment (Schrodt and Gerner, 1994).
As for event extraction models, similar to text classification models, there are off-the-shelf tools such as the Python packages stanza (Qi et al., 2020) andspaCy (Honnibal et al., 2020).In case of customized sets of event types, researchers can also train NLP models on a collection of textual documents with event annotations (Hogenboom et al., 2011;Liu et al., 2020, inter alia).

Score Prediction
NLP can also be used to predict a score given input text.A useful application is political text scaling, which aims to predict a score (e.g., left-to-right ideology, emotionality, and different attitudes towards the European integration process) for a given piece of text (e.g., political speeches, party manifestos, and social media posts) (Laver et al., 2003;Lowe et al., 2011;Slapin and Proksch, 2008;Gennaro and Ash, 2021, inter alia).
Traditional models for text scaling include Wordscores (Laver et al., 2003) and WordFish (Slapin and Proksch, 2008;Lowe et al., 2011).Recent NLP models represent the text by high-dimensional vectors learned by neural networks to predict the scores (Glavaš et al., 2017b;Nanni et al., 2019).One way to use the NLP models is to apply off-the-shelf general-purpose models such as InstructGPT (Ouyang et al., 2022) and design a prompt to specify the type of the scaling to the API, , or borrow existing, trained NLP models if the same type of scaling has been studied by previous researchers.Another way is to collect a dataset of text with hand-coded scales, and train NLP models to learn to predict the scale, similar to the practice in Slapin and Proksch (2008); Gennaro and Ash (2021), inter alia.

Using NLP for Policymaking
In the political domain, there are large amounts of textual data to analyze (NEUENDORF and KUMAR, 2015), such as parliament debates (Van Aggelen et al., 2017), speeches (Schumacher et al., 2016), legislative text (Baumgartner et al., 2006;Bevan, 2017), database of political parties worldwide (Döring and Regel, 2019), and expert survey data (Bakker et al., 2015).Since it is tedious to hand-code all textual data, NLP provides a low-cost tool to automatically analyze such massive text.
In this section, we will introduce how NLP can facilitate four major areas to help policymaking: before policies are made, researchers can use NLP to analyze data and extract key information for evidence-based policymaking (Section 3.1); after policies are made, researchers can interpret the priorities among and reasons behind political decisions (Section 3.2); researchers can also analyze features in the language of politicians when communicating the policies to the public (Section 3.3); finally, after the policies have taken effect, researchers can investigate the effectiveness of the policies (Section 3.4).

Analyzing Data for Evidence-Based Policymaking
A major use of NLP is to extract information from large collections of text.This function can be very useful for analyzing the views and needs of constituents, so that policymakers can make decisions accordingly.Fig. 4 NLP to analyze data for evidence-based policymaking.
As in Figure 4, we will explain how NLP can be used to analyze data for evidence-based policymaking from three aspects: data, information to extract, and political usage.
Data.Data is the basis of such analyses.Large amounts of textual data can reveal information about constituents, media outlets, and influential figures.The data can come from a variety of sources, including social media such as Twitter and Facebook, survey responses, and news articles.
To extract such information from text, we can often utilize the main NLP tools introduced in Section 2, including text classification, topic modeling, event extraction and score prediction (especially text scaling to predict left-to-right ideology).In NLP literature, social media, such as Twitter, is a popular source of textual data to collect public opinions (Thelwall et al., 2011;Paltoglou and Thelwall, 2012;Pak and Paroubek, 2010;Arunachalam and Sarkar, 2013;Rosenthal et al., 2015).Political Usage.Such information extracted from data is highly valuable for political usage.For example, voters' sentiment, stance, and ideology are important supplementary for traditional polls and surveys to gather information about the constituents' political leaning.Identifying the needs expressed by people is another important survey target, which helps politicians understand what needs they should take care of, and match the needs and availabilities of resources (Hiware et al., 2020).
Among more specific political uses is to understand the public opinion on parties/president, as well as on certain topics.The public sentiment towards parties (Pla and Hurtado, 2014) and President (Marchetti-Bowick and Chambers, 2012) can serve as a supplementary for the traditional approval rating survey, and stances towards certain topics (Gottipati et al., 2013;Stefanov et al., 2020;Luo et al., 2020) can be important information for legislators to make decisions on debatable issues such as abortion, taxes, and legalization of same-sex marriage.Many existing studies use NLP on social media text to predict election results (O'Connor et al., 2010;Beverungen and Kalita, 2011;Unankard et al., 2014;Mohammad et al., 2015;Tjong Kim Sang and Bos, 2012).In general, big-data-driven analyses can facilitate decision-makers to collect more feedback from people and society, enabling policymakers to be closer to citizens, and increase transparency and engagement in political issues (Arunachalam and Sarkar, 2013).

Interpreting Political Decisions
After policies are made, political scientists and social scientists can use textual data to interpret political decisions.As in Figure 5, there are two major use cases: mining political agendas, and discovering policy responsiveness.

Mining Political Agendas.
Researchers can use textual data to infer a political agenda, including the topics that politicians prioritize, political events, and different political actors' stances on certain topics.Such data can come from press releases, legislation, and electoral campaigns.
Example of previous studies to analyze the topics and prioritization of political bodies include the research on the prioritization each Senator assigns to topics using press releases (Grimmer, 2010b), topics in different parties' electoral manifestos (Glavaš et al., 2017a), topics in EU parliament speeches (Lauscher et al., 2016) and other various types of text (King and Lowe, 2003;Hopkins and King, 2010;Grimmer, 2010a;Roberts et al., 2014), as well as political event detection from congressional text and news (Nanni et al., 2017).
Further studies look into how political interests affect legislative behavior.Legislators tend to show strong personal interest in the issues that come before their committees (Fenno, 1973), andMayhew (2004) identifies that Senators replying on appropriations secured for their state have a strong incentive to support legislations that allow them to secure particularistic goods.

Discovering Policy Responsiveness.
Policy responsiveness is the study of how policies respond to different factors, such as how changes in public opinion lead to responses in public policy (Stimson et al., 1995).One major direction is that politicians tend to make policies that align with the expectations of their constituents, in order to run for successful re-election in the next term (Canes-Wrone et al., 2002).Studies show that policy preferences of the state public can be a predictor of future state policies (Caughey and Warshaw, 2018).For example, Lax and Phillips (2009) show that more LGBT tolerance leads to more pro-gay legislation in response.
A recent study by Jin et al. (2021b) uses NLP to analyze over 10 million COVID-19-related tweets targeted at US governors; using classification models to obtain the public sentiment, they study how public sentiment leads to political decisions of COVID-19 policies made by US governors.Such use of NLP on massive textual data contrasts with the traditional studies of policy responsiveness which span over several decades and use manually collected survey results (Caughey and Warshaw, 2018;Lax andPhillips, 2009, 2012).

Improving Policy Communication with the Public
Policy communication is the study to understand how politicians present the policies to their constituents.As in Figure 6, common research questions in policy communication include how politicians establish their images (Fenno, 1978) such as campaign strategies (Petrocik, 1996;Simon, 2002;Sigelman and Buell Jr, 2004), how constituents allocate credit, what receives attention in Congress (Sulkin, 2005), and what receives attention in news articles (Semetko and Valkenburg, 2000;McCombs and Valenzuela, 2004;Armstrong et al., 2006).
Based on data from press releases, political statements, electoral campaigns and news articles, researchers usually analyze two types of information: the language techniques politicians use, and the contents such as topics and underlying moral foundations in these textual documents.

Language Techniques. Policy communication largely focuses on the types of languages that
Other data sources used in policy communication research include surveys of Senate staffers (Cook, 1988), newsletters that legislators send to constituents (Lipinski, 2009) and so on.For example, previous studies analyze what portions of political texts are position-taking versus credit-claiming (Grimmer et al., 2012;Grimmer, 2013), whether the claims are vague or concrete (Baerg et al., 2018;Eichorst and Lin, 2019), the frequency of credit-claiming messages versus the actual amount of contributions (Grimmer et al., 2012), and whether politicians tend to make credible or dishonorable promises (Grimmer, 2010b).Within the political statements, it is also interesting to check the ideological proportions (Sim et al., 2013), and how politicians make use of dialectal variations and code-mixing (Sravani et al., 2021).
The representation styles usually affect the effectiveness of policy communication, such as the role of language ambiguity in framing the political agenda (Page, 1976;Campbell, 1983), and the effect of credit-claiming messages on constituents' allocation of credit (Grimmer et al., 2012).
Contents.The contents of policy communication include the topics in the political statements, such as what Senators discuss in floor statements (Hill and Hurley, 2002), and what Presidents address in daily speeches (Lee, 2008), and also the moral foundations used by politicians underlying their political tweets (Johnson and Goldwasser, 2018).
Using the extracted content information, researchers can explore further questions such as whether competing politicians or political elites emphasize the same issues (Petrocik, 1996;Gabel and Scheve, 2007), and how the priorities politicians articulate co-vary with the issues discussed in the media (Bartels, 1996).Another open research direction is to analyze the interaction between newspapers and politicians' messages, such as how often newspapers cover a certain politician's message and in what way, and how such coverage affects incumbency advantage.
Meaningful Future Work.Apart from analyzing the language of existing political texts that aims to maximize political interests, an advanced question that is more meaningful to society is how to improve policy communication to steer towards a more beneficial future for society as a whole.There is relatively little research on this, and we welcome future work on this meaningful topic.

Investigating Policy Effects
After policies are taken into effect, it is important to collect feedback or evaluate the effectiveness of policies.Existing studies evaluate the effects of policies along different dimensions: one dimension is the change in public sentiment, which can be analyzed by comparing the sentiment classification results before and after policies, following a similar paradigm in Section 3.1.There are also studies on how policies affect the crowd's perception of the democratic process (Miller et al., 1990).
Another dimension is how policies result in economic changes.Calvo-González et al. ( 2018) investigate the negative consequences of policy volatility that harm long-term economic growth.Specifically, to measure policy volatility, they first obtain main topics by topic modeling on presidential speeches, and then analyze how the significance of topics changes over time.

Limitations and Ethical Considerations
There are several limitations that researchers and policymakers need to take into consideration when using NLP for policymaking, due to the data-driven and black-box nature of modern NLP.First, the effectiveness of the computational models relies on the quality and comprehensiveness of the data.Although many political discourses are public, including data sources such as news, press releases, legislation, and campaigns, when it comes to surveying public opinions, social media might be a biased representation of the whole population.Therefore, when making important policy decisions, the traditional polls and surveys can provide more comprehensive coverage.Note that in the case of traditional polls, NLP can still be helpful in expediting the processing of survey answers.
The second concern is the black-box nature of modern NLP models.We do not encourage decision-making systems to depend fully on NLP, but suggest that NLP can assist human decision-makers.Hence, all the applications introduced in this chapter use NLP to compile information that is necessary for policymaking instead of directly suggesting a policy.Nonetheless, some of the models are hard to interpret or explain, such as text classification using deep learning models (Yin et al., 2019;Brown et al., 2020), which could be vulnerable to adversarial attacks by small paraphrasing of the text input (Jin et al., 2020).In practical applications, it is important to ensure the trustworthiness of the usage of AI.There could be a preference for transparent machine learning models if they can do the work well (e.g., LDA topic models, and traditional classification methods using dictionaries or linguistic rules), or tasks with well-controlled outputs such as event extraction to select spans of the given text that mention events.In cases where only the deep learning models can provide good performance, there should be more detailed performance analysis (e.g., a study to check the correlation of the model decisions and human judgments), error analysis (e.g., different types of errors, failure modes, and potential bias towards certain groups), and studies about the interpretability of the model (e.g., feature attribution of the model, visualization of the internal states of the model).
Apart from the limitations of the technical methodology, there are also ethical considerations arising from the use of NLP.Among the use cases introduced in this chapter, some applications of NLP are relatively safe as they mainly involve analyzing public political documents and factbased evidence or effects of policies.However, others could be concerning and vulnerable to misuse.For example, although effective, truthful policy communication is beneficial for society, it might be tempting to overdo policy communication and by all means optimize the votes.As it is highly important for government and politicians to gain positive public perception, overly optimizing policy communication might lead to propaganda, intrusion of data privacy to collect more user preferences, and, in more severe cases, surveillance and violation of human rights.Hence, there is a strong need for policies to regulate the use of technologies that influence public opinions and pose a challenge to democracy.

Conclusions
This chapter provided a brief overview of current research directions in NLP that provide support for policymaking.We first introduced four main NLP tasks that are commonly used in text analysis: text classification, topic modeling, event extraction, and text scaling.We then showed how these methods can be used in policymaking for applications such as data collection for evidence-based policymaking, interpretation of political decisions, policy communication, and investigation of policy effects.We also discussed potential limitations and ethical considerations of which researchers and policymakers should be aware.
NLP holds significant promise for enabling data-driven policymaking.In addition to the tasks overviewed in this chapter, we foresee that other NLP applications, such as text summarization (e.g., to condense information from large documents), question answering (e.g., for reasoning about policies), and culturally-adjusted machine translation (e.g., to facilitate international communications), will soon find use in policymaking.The field of NLP is quickly advancing, and close collaborations between NLP experts and public policy experts will be key to the successful use and deployment of NLP tools in public policy.

Fig. 3
Fig.3Given a collection of text documents, topic modeling generates a list of topic clusters.
bombs − −−− → Hamas sites" with the location "Gaza."Event extraction usually incorporates both entity extraction (e.g., Israel, Hamas sites, and Gaza in the previous example) and relation extraction (e.g., "bombs" in the previous example).