Assessing behavioral data science privacy issues in government artificial intelligence deployment

In today ’ s global culture where the Internet has established itself as the main tool for communication and commerce, the capability to massively analyze and predict citizens ’ behavior has become a priority for governments in terms of collective intelligence and security. At the same time, in the context of novel possibilities that artificial intelligence (AI) brings to governments in terms of understanding and developing collective behavior analysis, important concerns related to citizens ’ privacy have emerged. In order to identify the main uses that governments make of AI and to define citizens ’ concerns about their privacy, in the present study, we undertook a systematic review of the literature, conducted in-depth interviews, and applied data-mining techniques. Based on our results, we classified and discussed the risks to citizens ’ privacy according to the types of AI strategies used by governments that may affect collective behavior and cause massive behavior modification. Our results revealed 11 uses of AI strategies used by the government to improve their interaction with citizens, organizations in cities, services provided by public institutions or the economy, among other areas. In relation to citizens ’ privacy when AI is used by governments, we identified 8 topics related to human behavior predictions, intelligence decision making, decision automation, digital surveillance, data privacy law and regulation, and the risk of behavior modification. The paper concludes with a discussion of the development of regulations focused on the ethical design of citizen data collection, where implications for governments are presented aimed at regulating security, ethics, and data privacy. Additionally, we propose a research agenda composed by 16 research questions to be investigated in further research.


Introduction
In recent years, the development of artificial intelligence (AI) has led to the adaptation of organizational models in both companies and public organizations (Brynjolfsson & Mitchell, 2017).In today's global culture where the Internet has established itself as the main tool of communication, the global system of economy and regulations, as well as data and decisions based on behavioral analysis, have become essential for public actors (Ballestar, Camiña, Díaz-Chao, & Torrent-Sellens, 2021;Irvin & Stansbury, 2004).
In the context of this connected society, conceptualization, definition, and establishment of both theoretical and legal parameters that would set ethical and efficient limits on the analysis, treatment, and use of citizens' data have become a challenge in scientific, legal, and professional settings (Kamolov & Teteryatnikov, 2021;Narayanan, Huey, & Felten, 2016).As studied by Zuboff (2019b) numerous concerns regarding user privacy have emerged-particularly, when setting parameters for governments to make decisions regarding how to apply AI to understand behaviors in the society (Hiller & Bélanger, 2001), predict its actions and movements (Altman, Wood, O'Brien, Vadhan, & Gasser, 2015), and act accordingly.Of note, AI refers to the simulation of human intelligence linked to the development of algorithmic models that automatically work and learn by themselves through inputs developed by humans (Nagtegaal, 2021).
In recent years, the use of AI by corporations and governments has grown exponentially (Zuiderwijk, Chen, & Salem, 2021), and this growth has been predetermined by many benefits of AI, such as analysis of large amounts of data, predictions with high accuracy rates, identification of trends and patterns, predictions of complex associations, improvement of profitability, analysis of financial ratios and risks, among others uses (Mikalef et al., 2021).
In parallel to the increase in the use of AI in governments, and as a consequence of the evolution in data and behavioral analysis practices, the concept of behavioral data sciences (BDS) has been developed to combine a multitude of issues related to data science and behavior (Harari et al., 2016).Although the very term "behavioral data sciences" does not appear in the scientific literature, several previous studies, including Agarwal and Dhar (2014) and Van Der Aalst (2016), have directly defined the future guidelines for its development.Therefore, the term BDS refers to a new and emerging interdisciplinary field that combines techniques from behavioral sciences, psychology, sociology, economics, and business, and uses the processes from computer science, data-centric engineering, statistical models, information science, or mathematics, in order to understand and predict human behavior using AI (Saura, Palacios-Marqués, & Iturricha-Fernández, 2021).In essence, BDS is a mix of disciplines that combines knowledge of the data that users or citizens publicly generate on the Internet -known as usergenerated content (UGC) or Data (UGD)-through the use of mobile applications and other connected devices, such as Internet of Things (IoT), smart homes, self-driven cars, or through smart-cities connected services (Schreiner, Fischer, & Riedl, 2019).
With the use of techniques focused on BDS, governments could apply algorithms that work with AI and systems that analyze behavior (Grimmelikhuijsen, Jilke, Olsen, & Tummers, 2017), identify patterns to explore the knowledge about the society (Men & Tsai, 2014) as well as its consumers or users (Chen et al., 2020;Chen et al., 2021).In this study, the use of the BDS concept is specifically linked to the analysis of AI strategies developed by governments to date.Since the term does not relevantly appear in the published scientific literature, the present study is pioneering and original in this respect.
Furthermore, corporates not only leverage users and clients' data to improve their products and services, but also use them as exchange currency with interested third parties, such as governments or other public institutions (Silverman, 2017).Therefore, by studying user behavior data, companies and governments develop sophisticated power machines that predict an economic logic that helps corporates generate more money at the expense of users and citizens (Zuboff, 2019a).Likewise, according to the government actions, the use of AI raises concerns about privacy and personal security issues (Yang, Elisa, & Eliot, 2019).While predictions are not equal to observations, the more data is obtained from the society, the greater is the ability to predict.Accordingly, predictions can reach the same level of effectiveness as that of observations (Zuboff, 2019a).Therefore, if governments have this intelligence, and if it is also automated based on AI, the risk to privacy and free decision-making in the society could be at threat (Mazurek & Małagocka, 2019).
In this context, a key notion in this field is the concept of surveillance capitalism.According to Cinnamon (2017) and Zuboff (2019b), in surveillance capitalism, user experience and behavioral data are used as economic drivers to create a new economy where economic drivers and profits come from predicting how users behave.Therefore, considering this new concept, governments can take action and use AI as a tool focused on BDS.However, as stated Bromberg, Charbonneau, and Smith (2020), such use can violate citizens' privacy and security.For example, by using AI and BDS, governments can interfere with the behavior of the society to achieve a change in behavior, without the society being aware of it (Zuboff, 2015).There is also evidence of how governments can use AI to predict election results using massive data to change the voting intentions of thousands of users (Isaak & Hanna, 2018).This was the case of US Facebook users' behavioral data that, when analyzed with behavioral prediction algorithms, such as the one developed by Cambridge Analytica (Heawood, 2018), were employed to modify the election results in the US presidential campaign between Donald Trump and Hilary Clinton in 2016 (Cadwalladr & Graham-Harrison, 2018).Another example could be the famous German doll, Cayla, which recorded chunk dialogues said by children (Haynes, Ramirez, Hayajneh, & Bhuiyan, 2017), the company then sold those data Nuance Communications, which, in turn, developed a voice recognition software and sold it to the US Central Intelligence Agency (CIA) (Madnick, Johnson, & Huang, 2019).In this case, we can speak about government suppliers' provision of AI-related services that are unethical from citizens' point of view.
In this context, after the development of this type of events where AI, governments, and the data collection capacity of corporations is questioned, essential questions regarding the knowledge, authority, and power of government use of BDS techniques should be explored.Furthermore, understanding the predictive ability that government institutions might obtain if they train AI models that can predict user behavior is a prerequisite for any society to feel confident about implementing new technologies (Hobolt, Tilley, & Wittrock, 2013).Of note, data predictions and models that work with the prediction of human behavior are becoming dominant forms of capitalism and generate new business models and new products in the form of data (Zuboff, 2019b).Of note, BDS is a clear priority for the development of ethical strategies by governments when they implement AI in their strategies as it is presented as a new concept linked to user privacy, AI deployment in governments, or behavioral analytics, that brings together all of the above in the form of analysis of society's behavioral data.
However, several unanswered questions remain, such as what is the legitimacy of predicting user behavior?And who do these behavioral data belong to?Based on the privacy concerns outlined above and the originality of the study justified under the BDS new emerging concept, to the best of our knowledge, none of previous studies had identified and described the risks of governmental implementation of AI to citizens' privacy.Furthermore, there has been no research linking the concept of BDS to the main uses of AI by governments.Thus, we seek to fill a gap in the literature by exploring the possible uses and risks to citizens' privacy if governments implementation of AI in their strategies under the new BDS conceptual framework.To this end, this study first develops a systematic review of the literature to establish and confirm the main scientific contributions to date in this field of study.Secondly, based on the results of the systematic review of the literature, 15 interviews were conducted with 11 individuals working in the government; of these, 2 were economists for the government, and 2 belonged to organizations that advise the government.Thirdly, based on the coded results of the interviews, two data-mining techniques (topic-modeling and textual analysis) were developed to identify insights and create knowledge related to the object of study.Following this approach, the present study aims to identify and discuss the main practical and theoretical implications for governments when using AI-based strategies with BDS techniques.
Therefore, in order to cover the identified gap in the literature, the present study addresses the following research questions (RQ): RQ1: What kind of citizens' privacy issues are expected when governments use behavioral-based AI in their strategies?and RQ2: What AI techniques can governments develop to predict the society's behavior?
With the development of the study and the answers of the RQ, this study also intends to attain the following specific objectives: ▪ To identify definitional perspectives of behavioral data science privacy issues in government AI deployment ▪ To explore the types of behavioral data science approaches used in governments ▪ To create knowledge about government AI deployment preserving society privacy ▪ To outline future guidelines to track new challenges in behavioral analytics and government AI deployment Based on the results, we discuss theoretical implications regarding the application of AI strategies used by governments that respect the privacy of citizens' data.In addition, the main contributions to date are theorized in relation to the management of user data and the need to regulate security, ethics, and privacy of user data.Similarly, we also discuss practical implications that form a guide for the application of AI strategies by governments that avoid any type of privacy violations linked to surveillance capitalism actions.
The remainder of this paper is structured as follows.In Section 2, the theoretical framework of the study is presented.Section 3 discusses the methodological approaches used.Section 4 reports the results.Section 5 provides a discussion of important theoretical contributions and future directions that our results offer for the analysis of BDS privacy issues in government AI deployment.Conclusions, along with a discussion of theoretical and practical implications, are presented in Section 6.

Understanding surveillance capitalism and behavioral data sciences
As argued by Zuboff (2019b) and Belhadi et al. (2021), we are living in one of the deepest transitions in the information age-namely, in an ecosystem where data are the largest source of information.Seeking to outline a theoretical background with the main concepts used to analyze and predict user behavior in the digital ecosystem, this section identifies the main theoretical perspectives used in the literature to analyze the factors that contribute to the development of AI in governments.
For their part, governments need to be updated and use the latest technologies to understand what the demands of the society are (Figenschou, 2020).However, according to many initiatives, the regulation of the Internet itself is not working, and the society demands that its data should remain anonymous at all costs (Zuboff, 2019b).This raises concerns about user privacy (Ribeiro-Navarrete, Saura, & Palacios-Marqués, 2021).Users are aware of the fact that, based on the analysis of human experiences linked to behavioral data, governments can turn their actions into sophisticated intelligent machines capable of predicting any issue targeted by governments (Kavanaugh et al., 2012).Therefore, the ultimate goal will always be to understand the future behavior of the society regulated by governments (Linders, 2012).
Under this paradigm of privacy concerns about AI and its implementation by governments to monitor, actively listen, trace possible states of alarm, or predict any kind of event that negatively affects the society, the concept of surveillance capitalism is born (Cinnamon, 2017;Zuboff, 2015;Zuboff, 2019b).The concept of surveillance capitalism advocates that human experience is unilaterally automated as data sources to predict human behavior (Zuboff, 2019a).While there are indeed objectives of service improvement and understanding the society's behavior to improve the public offer by governments (Andrew & Baker, 2019), the concept of surveillance capitalism also implies that humans are used as products of massive data production to improve the economic profitability of companies at the expense of the data about user behavior (Zuboff, 2015).
In the circumstances where ethical actions are lacking, companies and governments use behavioral data to make the society behave in ways that are more convenient to obtain greater economic benefits (Zuboff, 2015).When viewed from a business perspective, this leads to an increasing number of Internet-centered business models that cater to addictive behavioral patterns (Hou, Xiong, Jiang, Song, & Wang, 2019).In this way, users generate more data about their behavior; accordingly, their attitudes and feelings can be predicted.Then, based on these actions, companies and governments generate more profitability on the advertising (Palos-Sanchez, Saura, & Martin-Velicia, 2019) products shared in these business models (Dwivedi, Kapoor, & Chen, 2015), or using user behavior data as the basis of data-centered strategies (Dwivedi et al., 2018).
In surveillance capitalism, the main source of data is the information generated by users while using connected devices.All this information is analyzed using BDS that takes a new perspective of analysis through a combination of different fields of research (Zhuoxuan, Yan, & Xiaoming, 2015).In recent years, the number of tools used by both governments and companies to obtain data has considerably increased (Paul & Aithal, 2020).In fact, many variables indicate parameters for measuring user behavior on the Internet or through their mobile and connected devices (Hobolt et al., 2013).
Until now, the main sources of data were websites, cell phones, intelligent organization systems, Customer Relationship Management (CRM) systems, and marketing automation sources, among others.This type of data always generates categories known as events or objectives, which have the purpose of explaining some properties defined by the organizational structure of the data-analysis system (Abou Elassad, Mousannif, Al Moatassime and Karkouch, 2020).However, as mentioned above, the number and type of connected devices has recently exponentially increased, from IoT to smart city services, among other connected devices (Kankanhalli, Charalabidis, & Mellouli, 2019).
The understanding of user behavior data on the Internet has led to the emergence of new digital marketing strategies in the business ecosystem (Dwivedi et al., 2020).It is not the first time that the business ecosystem offers opportunities and benefits to government institutions to maximize their processes (Zhang, Wang, & Zhu, 2020), increase the efficiency of their strategies (Pencheva, Esteve, & Mikhaylov, 2020), or create new listening tactics (Macnamara, 2015).Following these considerations, Table 1 presents the main concepts related to BDS analysis that can be used by governments to monitor user behavior through the data they generate.Source: The authors.

Main user's behavior data sources used by the government in their monitoring strategies
In the ecosystem that drives the development of the economy based on behavioral analysis and its data, the importance of data sources can hardly be overestimated (White & Boatwright, 2020).As argued in many previous studies, since users are not fully aware that their data will be used and sold to interested third parties for a financial contribution (Acar, Englehardt, & Narayanan, 2020), ethics is not an essential component of new business models focused on massive data collection and analysis (Löfgren & Webster, 2020).
Moreover, several available initiatives-such as the new GDPR legislation introduced by the European Union through the European Commission to protect users based on abusive uses of their data-are insufficient (Sørensen & Kosta, 2019).Although the new regulation obliges companies to explicitly specify how the data are used, in reality, users do not possess knowledge necessary to understand privacy policies and legal notices of the applications on their mobile computers and any types of connected device (Martin, 2015).
There is evidence that, due to the psychological phenomenon known as "instantaneous reward", despite some awareness about privacy issues, the millennial generation and the digital natives prefer to use the applications as fast as possible instead of taking time to understand how their data will be used (Hull et al., 2004).In many situations, including the recent Covid-19 pandemic, the issue of user privacy has raised many concerns (Maher, Hoang, & Hindery, 2020).During the Covid-19 crisis, in order to track Covid-19 infections and notify citizens if they have been in contact with an infected, many governments have decided to ask citizens to use applications that track their location (Gerard, Imbert, & Orkin, 2020).
A recent analysis of these novel monitoring techniques by governments suggested that citizens frequently use this type of active listening (Maher et al., 2020;Zhou, Yang, Xiao, & Chen, 2020).The sources of data that governments may have access have been studied previously within several projects, such as the one published by The New York Times (Thompson & Warzel, 2019).Specifically, Thompson and Warzel (2019) highlighted many decision-making concerns that governments may have with access to multiple companies collecting information from users.Then, governments use user data to improve their processes of monitoring the society and its behavior (Thompson & Warzel, 2019), thereby prioritizing the issue of national security.With this type of strategy, user behavioral data are used as a source that governments use to train their algorithms that work with machine learning.Accordingly, behavioral data analysis is of a paramount important for governmental strategies focused on AI (Ribeiro-Navarrete et al., 2021).
Of note, user data can be transferred by third parties to governments (Thompson & Warzel, 2019).As described by Saura, Ribeiro-Soriano, & Palacios-Marqués (2021b), data sources on user behavior can be of the following three types: (i) public, i.e. when users are aware that the information they generate is in the public domain; (ii) private, i.e. when users know that the information they generate will be used exclusively for their personal use, and (iii) when behavioral data are transferred to third parties as products, including governments, public or private institutions (Saura, Ribeiro-Soriano, & Palacios-Marqués, 2021c).Table 2 summarizes data sources that can be used by governments for the BDS analysis.
The data sources shown in Table 2 are examples of the multitude of citizen behavior data sources that can be used by governments to obtain information for further analysis with AI (Cate, 2008).In this context, it is unsurprising that the society's concerns about privacy continue to grow (LaBrie, Steinke, Li, & Cazier, 2018).If used by governments in their systems for control, prediction, and analysis of user behavior, these data sources can affect the privacy and security of citizens' personal data.

Systematic review of the literature
To better understand the main uses of AI by governments as studied in the scientific literature to date, we conducted a systematic review of the literature (de Camargo Fiorini, Seles, Jabbour, Mariano, & de Sousa Jabbour, 2018).Systematic literature reviews are exploratory research approaches used to understand emerging new fields of study (Kraus, Breier, & Dasí-Rodríguez, 2020).A major reason underlying the recent increase in the number of systematic literature reviews is that a literature review makes it possible to outline a theoretical framework with the main agents that contribute to the development of the proposed research objective.Therefore, the aim of systematic literature review is to analyze an emerging issue and to identify the main techniques employed to study that issue.Therefore, systematic reviews are an effective method to identify the proposed objectives related to AI uses in governments and citizens privacy (Zuiderwijk et al., 2021).
In the present study, we followed the procedure developed by Bem (1995), who proposed that a systematic review should be divided into the following three steps.In the first step, the topics to be discussed within the scientific area are identified.To this end, keywords are identified that can summarize the objective of the research through searching databases (Sarkis, Zhu, & Lai, 2011).In the second step, the searches in these databases are performed, the collected data are filtered, and the results are analyzed (Akter & Wamba, 2016).During the filtering process, titles, abstracts, and keywords of potentially relevant studies are examined.This is followed by the analysis of the content of the articles, and their suitability for the review is assessed.The studies that do not meet these criteria are excluded from the systematic review process.In the third step, the content of the contributions retained in the sample is analyzed, and the main concepts are discussed (Zeng, Hu, Balezentis, & Streimikiene, 2020).
In the present study, final contributions were selected during the review process that focused on identification of the main purposes of each potentially relevant study (Akter et al., 2019).The searches were conducted in the following databases: Web of Sciences (WOS), IEEE Xplore, ScienceDirect, ACM Digital Library, and AIS Electronic Library.The keywords used to search the databases were "Government" OR "Governance" OR "Public Management" OR "Public Sector" OR "Public Administration" OR "Public Policy" OR "State" OR "Municipality" OR

Connected devices
Connected devices such as thermostats, home assistants, lamps, bulbs, etc.
"Citizens" AND "Artificial Intelligence" OR "AI" OR "Predictive Analytics" OR "Intelligence Systems" OR "Expert Systems" OR "Collective Behavior" OR "Surveillance Capitalism" OR "Behavioral Analysis".The searches were performed between October 5 and 10, 2020 and updated in January 2022.Of note, the search term BDS has not been used in this process, since the results of government studies using AI were analyzed from the perspective of BDS as a new emerging concept, which this study thoroughly outlines and defines in the results.
The results of the process were as follows.In WOS, 20 articles were selected from a total of 65 potentially relevant results; in ScienceDirect 7 results were selected from a total of 29 studies; in AIS Library, the total number of potentially relevant studies was 3, of which only 1 was retained in the final dataset; ACM Digital Library, a total of 4 studies were found, of which 2 were selected; finally, in IEEE Xplore, of a total of 20 potentially relevant study, 4 were selected.Therefore, after the selection process, a total of 34 research studies were selected to be included in the present study.For the exclusion criteria, we followed PRISMA evidence-based minimum set of items (Saura, Ribeiro-Soriano, & Palacios-Marqués, 2021a) aimed to filter quality research studies.First, the abstracts and keywords of the articles were analyzed to identify inadequate and not inclusive terms related to the objectives of the study.Second, an in-depth analysis of the articles identified as suitable was performed.Next, we analyzed whether the objectives of the study were directly or indirectly linked to the objectives of the present research.Then, we determined whether the topic is related to the research objectives.Additionally, we identified whether or not the quality of the methodology and evaluation of results were acceptable.Finally, articles that did not describe or specify terms appropriately to the objectives of the present study were excluded.Accordingly, Table 3 provides further detail on the studies included in the present study that were analyzed.
Following the methodological indications outlined in Snelson ( 2016) and Collins et al. (2021) and Mikalef et al. (2021), once the systematic review of the literature was developed to verify the validity of theoretical underpinnings, we proceeded to the development of the second part of the proposed approach.In this way, once the relevancy of AI and BDS governments uses was justified, we structured and designed the interviews based on the results of the systematic literature review.Details of this approach are presented below.

In-depth interviews
Seeking to obtain additional knowledge regarding the uses of AI by governments and the concerns related to citizens' privacy, we conducted in-depth interviews with informants working in governments.Following the guidelines proposed by MacDougall and Fudge (2001), our qualitative interviews were held with politicians, senators, and other government-related officials in Spain.
The ultimate goals of these interviews was not to quantitatively assess the studied phenomenon, but rather to gain a deep understanding of it by obtaining information from an original primary source.The importance of such qualitative approach was previously justified by Orlikowski and Baroudi (1991) and Roberts (2015).Subsequently, the content of the interviews was used to build theory and extract insights.
We conducted a total of 15 interviews on user privacy and AI strategies developed by governments.Of these 15 interviews, 5 were conducted by phone (Pell et al., 2020), 2 by video call (Lukacik, Bourdage, & Roulin, 2020), 4 in person (Lukacik et al., 2020), and 4 by email (McKinley, Fong, Udelsman, & Rickert, 2020).In all cases, the interviews were digitally coded for further analysis under the Natural Language Processing (NLP) framework.Of 15 informants, 11 individuals worked in the government, 2 were economists for the government, and 2 belonged to organizations that advise the government (see Table 7).The informants were Spanish (12), Venezuelan (1), Egyptian (1), and Colombian (1) nationals.Their identities are anonymized in the present study (Natow, 2020).The three interviewees who were not native Spaniards live in Spain and work for governments or corporations linked to economics, finance, and politics.All members taking part in the interviews are linked to the Club Financiero Génova (CFG) in Madrid, a club focused on economic development, business, and politics.The interviewees were informed of the interviews at various events held at the CFG and contacted afterwards.The interviews were conducted in Spanish and translated into English.
Of note, as informed by the European Commission, Spain has developed a strategy report to monitor the development, as well as to uptake and measure the impact of AI in their government actions.The Spanish government has informed that they use AI to facilitate the development and deployment of the economy and society.Its strategy adopts a multidisciplinary approach to address economic, social, environmental, public management, and governance challenges, and it includes perspectives for a wide range of sectors and disciplines (European Commission, 2020).
In-person interviews and video call interviews lasted for about 30-40 min each.Telephone interviews averaged 20-25 min in length.Email interview responses averaged 750-600 words each.Interview data were collected between October 15, 2020, and January 8, 2021.Questions are shown in Appendix A. The informants were selected based on the work they do or have previously done in the government.All informants were linked to public administrations, governments, political parties, or advisors to the government.Our interviews were semistructured and included open-ended questions.Table 4 shows the characteristics of our informants based on their role, industry of specialization, professional status, organization they belong to, and nationality.
The main reason to ask open-ended questions in our interviews was to address a wider range of experiences (Dhillon & Torkzadeh, 2006).As noted above, the interview data were then transcribed and coded using exploratory data-based techniques (Bacq, Janssen, & Noël, 2019;Cooke-Davies & Arzymanow, 2003).The interviews received via email were used directly in the original format and sent for coding in the global database.The demographic characteristics of the informants are summarized in Table 5.

Data-mining techniques: Using LDA and TA to extract insights
In the last decade, data-mining techniques have been extensively used notably in the scientific literature (Yang & Wu, 2006).These techniques are used to create knowledge and extract insights from both structured and unstructured databases (Wu et al., 2003).A combination of several data-mining techniques processes can provide truly relevant insights into the objects proposed under study (Jindal & Borah, 2013).
In the present study, two data-mining processes were combined: Latent Dirichlet Allocation (LDA) and Textual Analysis (TA).The first one was a topic-modeling algorithm developed in Python to extract insights in the form of topics.LDA was applied to the database containing the content of the in-depth interviews (Blei, Ng, Jordan, & Lafferty, 2003;Pritchard, Stephens, & Donnelly, 2000).The novelty of this approach is that we used a methodology typically applied to analyze to explore primary interview data.These considerations are indicated in Krippendorff (2013) for the process of content analysis.
Specifically, the algorithm applied by the LDA identifies the most relevant words in the analyzed documents.In the present study, each interview was considered as a document.Using the topic-modeling process with LDA, we identified approximately 10 words for each document.These words were then used to form the names of topics in the data.This is a standard process in the use and development of LDA using the NLP framework.In the present study, the LDA process was computed with Python LDA 1.0.5 software.
Second, to complement the qualitative analysis outlined above with a quantitative assessment, we computed the key values of the identified topics.Keyness is a statistical indicator that measures the value, also known as the log-likelihood score (Rayson & Garside, 2000).This metric provides statistical meaning and makes it possible to measure the   2010), the log-likelihood score of 3.8 or higher was reported to be statistically significant at p < 0.05.Therefore, the interview conversations were established as inputs phrases, and text documents were considered as sub-corpora of the original corpus.Statistical significance in this study was considered when p < 0.05 Drmota, Szpankowski, & Viswanathan (2012).Furthermore, we used textual analysis computed in Python (Anand, Bochkay, & Chychyla, 2020).With this approach, it is possible to identify values in the form of insights using in-depth content analysis (Millstein, 2020).Specifically, the variables related to the weighted percentages/frequency of a keyword in the database composed of the set of interviews were studied (McHugh et al., 2020).In this way, the relevance of certain keywords was obtained (Auer, 2018).Based on the percentages of relevance achieved, we established parameters that casually explained the objectives of the present study (Saura, Ribeiro-Soriano, & Palacios-Marqués, 2021b).This exploratory approach follows the indications of content analysis using the NLP framework.
An analysis of the main n-grams collected in the coded text of the interviews was also performed.In order to compute the n-grams To propose a hybrid process analytics approach to achieve high operational efficiencies and high-quality assignments splitting up complex data into more manageable for R&D project selection Hybrid process analytics approach, datadriven process models, clusters proposal, highquality assignment, high-quality reviews, sematic language models, social learning theories, social network analysis, bibliometric analysis To propose a hybrid AI system to reduce resources and increase profitability Hybrid AI system, multi-agent systems, hard type, virtual organizations (VO), case-based reasoning systems (CBR), planning tasks, task assignment ▪ Understanding how AI models can be linked to economies of scale actions in which profitability is the major indicator ▪ Linking these models to strategies in order to increase profitability in governments Zheng et al. (2020) To propose an automated and agile platform to enhance efficiency and reduce moral hazard in civil servants' work environments.
Civil servants, moral hazard, policy, government service provision, automatization ▪ Exploring the use of models based on datacentric systems and platforms to predict behavior ▪ Understanding the use of AI systems in different industries and their benefits for governments Source: The authors.
analysis, we followed Wu and Su (1993) who argued that statistical analysis of the measure known as mutual information (MI) is justified when using textual analysis and n-grams.This indicator refers to the probability of co-occurrence of two variables that are correlated.Likewise, Bouma (2009) and Iyengar et al. (2012) used MI indicator between random variables X and Y.Of these, those with marginal probabilities and p(x) and p (y), and joint probabilities p (x, y), can be computed.

Results of systematic literature review
According to the results of the systematic literature review, in the studies included in the dataset, we identified the main uses that governments make of AI.In this way, the interviews were developed based on the results of this methodological process.Furthermore, to complement the results obtained through our systematic literature review as indicated previously, we conducted interviews with informants who work or have worked in governments.The interviews were based on the main concepts related to user privacy and governments' use of AI found in the literature (see Table 6).Therefore, the aim of the interviews was not only to understand new uses of AI, but also to obtain information regarding user privacy and how user information is treated based on the results presented in Table 6 from the literature review.
Regarding the major identified uses of AI by the governments, the main one is the continuous development of new models that increase the efficiency of the results (Chamola et al., 2020).This is a characteristic of AI, since the more the models that work with machine learning are trained, the greater the efficiency in terms of prediction of finance is, if the objects are focused on profitability.Likewise, the uses focused on decision making for process improvement and the evolution of management and governance practices were also remarkable (Skaug Saetra, 2020).
In this way, techniques are used to understand and optimize interactions with citizens (Androutsopoulou et al. (2019) through channels such as social networks (Saura, Palacios-Marqués, & Iturricha-Fernández, 2021;Silva et al., 2015), as well as information systems or   Source: The authors.
J.R. Saura et al. data exchange platforms.Automation and the use of models and algorithms are being increasingly widespread in governments, as they, through new technologies linked to SI (chatbots, IoT, smart cities, among others), try to collect databases that can predict how society is organized, determine financial models, and improve the optimization of industries and cities (Silva et al., 2015;Zato et al., 2011).Similarly, in order to cover the objectives proposed in the present study, Table 7 details the main privacy issues for users and citizens, and concepts linked to AI uses found in the literature review.Of note, concepts linked to the use of AI and security of user data, prediction, and analysis of their behavior, as well as its modification, were considered (see Zuboff, 2019aZuboff, , 2019b)).
The results of our analysis of privacy issues and concepts found in the literature review and presented in Table 7 highlights the ease with which governments have access to citizen data to train AI models (Engin & Treleaven, 2019;Wong, 2019).Precisely, public institutions try to solve this fact with the initiatives for good governance (Martín & León, 2015).However, citizen privacy is a human right directly linked to the legitimate use of data and access to citizen information (Chatterjee & Sreenivasulu, 2019).Predicting citizens' behavior based on the data they generate, in economic, social or health terms, is relatively easy with the numerous data analysis techniques that use AI to make predictions (Biros, 2020).
The problem lies mainly in the data protection regulations that may allow government to use these techniques without violating citizens' privacy, as highlighted by Shneiderman (2020).In this way, a balance must be found between the use of citizens' data to make predictions of their behavior by governments.This can be done by improving economic, social, or cultural indicators (Polat & Alkan, 2020;Skaug Saetra, 2020).If governments develop strategies focused on profitability indicators, citizens' data become economies of scale that can lead to illicit BDS practices to modify, whether intentionally or unintentionally, citizens' behavior (Ribeiro-Navarrete et al., 2021).

LDA and textual analysis of interview data
Using the LDA process, a total of 7 topics were identified.Of these, 4 topics were related to privacy issues (Human behavior, Behavioral predictions, Data privacy law and regulation, and Risk of behavior modification) and 3 further topics were related to AI deployment by governments (Intelligence decision making, Digital surveillance and Decision automation).Table 8 summarizes the identified topics, their descriptions, and the corresponding indicators of keyness and p-value.
Based on the results of textual analysis, the most frequent words are presented in Table 9.In addition to measuring the weighted percentage in the entire database, the keywords were grouped by similarity.
To obtain additional insights using data mining, we also defined ngrams supported in placement analysis that takes into account the contexts where words occur in a corpus (Biber, 2004;McEnery & Hardie, 2013).In this way, we analyzed the position of the main words in the database, with a particular focus on the place where a word is positioned.Therefore, placement presents a strong and stable relationship, also called a lexical or n-gram package.Table 10 lists the identified ngrams presented by rank (R), with the words identified in Table 9.
Here, frequency refers to the total frequency of appearance of the collocates in the in-depth interviews database.As indicated in Saura, Ribeiro-Soriano, & Iturricha-Fernández (2022), this is the sum of Freq L of the words that appear on the left on the topic and Freq R of the words that appear on the right of the topic.

Discussion
In the present study, we explored the main uses and techniques of AI developed by governments, as well as investigated the main concerns related to user privacy.Our analysis of the qualitative interviews analyzed using several data-mining techniques yielded several important insights.
Overall, governments consider knowledge of citizens' behavior to be key for the success of good governance (Chatterjee, 2019).However, as demonstrated in previous research on human behavior applying AI techniques, both predictions and correlations that can be identified in the collective behavior analysis pose serious risks to user privacy (e.g., Biros, 2020).
Furthermore, the literature review process provided a thorough understanding of the main research developed in these fields, thus getting 13 uses related to AI in governments and 11 issues related to citizens' privacy.As stated by Zuiderwijk et al. (2021), these insights can be used to outline the interviews as an additional method, as well as to create theory and knowledge in relation to the studied topic.
Similarly, in relation to the identified interview topics related to the predictions of behavior, it becomes clear that both the feelings and the

Table 7
Main privacy issues and concepts found in the literature review.Source: The authors.
J.R. Saura et al. actions derived from the analyzed data can be studied to optimize processes or to control the population.This suggests that there is a risk that governments can develop actions linked to surveillance capitalism or perform illegitimate actions of collective behavior analysis (see Zuboff, 2019aZuboff, , 2019b)), if these are used without caution.However, as indicated by Kamolov and Teteryatnikov (2021), these actions are powerful tools that can drive smart and good governance.
Although behavioral reactions can be meaningfully used to prepare for possible states of alarm, there is also a need to explore the ways to prevent, for example, cyberattacks that may jeopardize the user privacy; similarly, there is an urgent need to explore the limits of privacy when studying how the population will act (Engin & Treleaven, 2019).In this relation, Informant I pointed out to the following issue: "We use artificial intelligence to predict possible criminal acts in the city.When artificial intelligence and our analyses tell us that there is a neighborhood where serious crimes, such as murder, can be committed, we increase the number of police patrols in those neighborhoods and with this, we try to act more quickly".
Therefore, as mentioned by the aforementioned interview participant, although governments can effectively use AI techniques to prevent illegal actions (Jimenez-Gomez et al., 2020), when AI is embedded in government strategies and decisions are automated, such as in critical or Covid-19 pandemic alert situations (Chamola et al., 2020), the risk of abusing user privacy increases (Chatterjee & Sreenivasulu, 2019), although from the government's point of view, it optimizes decisionmaking processes and data-driven decisions (Ribeiro-Navarrete et al., 2021).
A similar point was made by Informant M: "In states of alert such as that generated by the COVID-19 pandemic, the use of artificial intelligence to predict possible infections and deaths has been used with statistical models.These models have helped us to both improve health care and the movement of people in cities, when a lockdown has been necessary."The participant added, "but the use of applications to track the location of user devices, although always anonymously, has highlighted the need to regulate the use of both artificial intelligence technology and other similar technologies to control the population in some way."These indications contrast the results reported by Zhu, Chen, Dong, and Wang (2021) in their study in China, where the control and prevention of pandemics or diseases with the use of AI becomes a priority to achieve governments' aims, while the alert state can justify the performed actions.From the quote above, we can conclude that from the government perspective, it is taken into account that privacy is a powerful strategy that can be used for digital surveillance.
In today's digital era, the data generated by citizens can help anticipate their movements also from the marketing perspective, as in inducing users buy products or services (Martín & León, 2015) through manipulation on the Internet; this is typically done through an analysis of people's in customer journey (Dwivedi et al., 2020), online decisionmaking, or creating addiction in unethical strategies in social networks.
Furthermore, Informant B stated that "Intelligence in governments has been used for several years.We focus mainly on listening and predicting possible causes affecting the State.However, it is true that there is still a wide range for the development of privacy and regulatory standards, and how these technologies and their applications can try to use user data, respecting or not their privacy".
However, one of the challenges in terms of surveillance of the society is to understand how governments can implement AI from the point of view of automatic decision-making (Chatterjee & Sreenivasulu, 2019).The prediction of user behavior is determined by the source of the data, which, in turn, can lead to digital manipulation of users (Stoica et al., 2013).Citizens should be aware of how governments will use their data and authorize (or not) the use of their data to train predictive models to, for instance, anticipate their movements and locations (Ribeiro-Navarrete et al., 2021).
Specifically, if AI is used by governments with a focus on making smart decisions, as is the case of the aforementioned informant, the risk to the privacy of users' information is lower (Jimenez-Gomez et al., 2020).However, when AI is consolidated in government strategies, and when decisions are automated, such as in critical situations or in a state of alarm due to the Covid-19 pandemic (Chamola et al., 2020), the risks  to user privacy violations increases (Chatterjee & Sreenivasulu, 2019).Therefore, if governments have access to third party data, and these are linked to their national intelligence which already has access to massive data, this could led to possible manipulation and decision making focused on surveillance capitalism (see also Shneiderman, 2020).
In this respect, Informant N made the following observation: "Governments have access to a multitude of sources of data on citizens and users.However, governments always use legitimate sources of information and a priori, they do not have access to third-party sources that can pass on personal data of users to governments for use in non-legitimate artificial intelligence models."He then continued: "And concerning the risk for manipulation of citizens and surveillance, national security processes increasingly use artificial intelligence, and as we know it works with data.The risk of manipulation does not exist because citizens are free in their actions and artificial intelligence and automation is intended to predict how the tasks that the government performs can be optimized and are always legitimate." In this way, Informant N highlights the use of AI tools and strategies in the government for the management of decision making, optimization of messages and conversations with citizens, as well as automation in public domain decisions (see also Chen & Wen, 2021).As argued by Zuboff (2019b), the power of prediction through access to millions of data can cause systematic violations of citizens' privacy, even without governments being aware of it due to a misunderstanding of the technology (see also Saura, Ribeiro-Soriano, & Palacios-Marqués, 2021a).Therefore, the automation of decisions focused on AI by governments should be regulated (Engin & Treleaven, 2019).Although governments attempt to use legitimate data sources, there are already some examples when, despite the legitimate intentions of the government, the companies that passed those data to governments had made illegitimate use of those data (Thompson & Warzel, 2019).
Accordingly, the information generated by users both on the Internet and on digital devices must comply with a new regulatory framework for data protection.Users must have the right for their data and decide whether or not these data could be transferred to companies (Pencheva et al., 2018).However, at present, the data are a currency so that if, for instance, users want to use an application, they a priori accept the privacy policy and, in case it is rejected, they will not be able to use that application.As indicated by Caudill and Murphy (2000), this can be understood as blackmail of users; indeed, in most cases, the option of buying an application with the option of not giving the data to the company, which may subsequently sell those data, does not exist (Bennett & Raab, 2020).Yet this and other initiatives (Obar & Oeldorf-Hirsch, 2020) may allow one to predict user behavior and use BDS without undermining collective behavior analysis.
In addition, with respect to surveillance capitalism and the use of AI by governments, specific regulation should ensure that the used data sources are legitimate and that they are not used, consciously or unconsciously, to manipulate the population so that to obtain economic benefits from both the government and the companies working with the data.

Future research agenda
The development of governmental uses and practices of AI has been defined to be essential for the future of governance that supports new technologies (Chatterjee et al., 2021).The adoption and use of these technologies should focus on improving services to citizens and society in general.However, the challenges and risks of new forms of AI need to be properly understood and studied in the future (Chen & Wen, 2021).Accordingly, the identification of different techniques developed by governments for the acquisition and collection of massive data from citizens becomes a priority.As argued by Zuboof (2019a), the ethical principles and values that ensure the privacy of citizens should be properly defined and classified so that governments can establish good practices in the future.
Applying AI in the processes developed by governments can help to predict the behavior of citizens.Accordingly, regulations must be developed so that governments can make legal use of tools to predict society behavior (Wilson, 2022).Decision-centric tools working with AI and data automation must draw the line between decisions that need to be made by humans and machines.As indicated by Al-Mushayt (2019), the automation of data analysis and behavior prediction algorithms must comply with regulations that ensure legitimacy of user privacy (Ashok et al., 2022).Parallel to these new processes, new limits must be established to avoid the risk that, through initiatives to modify mass behavior, governments can achieve non-legitimate or non-lawful objectives promoted by states of emergency or national security nature (Susanto et al., 2021).Therefore, the development of the influence of surveillance capitalism on the uses and practices of DBS techniques should be studied in depth (Zuboff, 2019b).
In the present study, we observed that the existing relationships between user privacy, risk of personal data management, and promotion of actions that can modify citizens behavior are just some of the challenges that researchers should study in depth in the medium and long term (Benefo et al., 2022).In addition to automation and the exponential development of AI, these new technologies must be regulated in advance, as, while technology advances exponentially, legislation and its development entail longer time horizons (Di Vaio et al., 2022).Governments must be aware of this weakness and should understand and develop legislation related to AI and its possible unethical uses well in advance (Ribeiro-Navarrete et al., 2021).
Likewise, the efficiency in the performance of the AI-based strategies, as well as its risks and economic benefits, must be correctly defined and classified.If we closely attend to these issues, information and data processing and improvement of decision-making by governments (Nasseef et al., 2021) must be correctly designed in practice.From the perspective of citizens' benefits, and from their relationship with public institutions, these uses should be correctly implemented in government's strategies.Governments must ensure that society can trust the uses of AI in relation to massive analysis of behavior and collective intelligence.
Therefore, considering the points outlined above, seeking to structure the future of BDS exploration and to contribute towards creating collective behavior analysis strategies of society, we present an agenda of future research questions that must be answered in further research on AI implemented by governments and user privacy (see Table 11).

Theoretical implications
The first contribution of the present study is that our results bridge a gap in the literature that, until now, has lacked a thorough application of the concept of BDS to study decision making by governments using AI.Accordingly, our results can be used by other academics to design new research on user privacy, predicting user behavior, or optimizing decision making in governments.More specifically, this study provides theoretical information related to collective governance and the improvement of government services with the use of AI.In addition, our results suggest that governments must participate in the development and application of AI-related regulations.It must be understood and theorized that, through the development of public policy, easily measurable strategies and processes should be established.Decisionmaking deployment should focus on a safe improvement of AI applications.The development of AI must be linked to the legislation and regulation of government relations with third parties, as well as with the companies that collect the data and transfer those data to public institutions.
Similarly, ethical governance is a key element for governments to follow ethical practices and monitor the establishment of new AI functions to predict the behavior of society.Citizens' surveillance and the legitimate use of AI by governments close the cycle of identification and classification of the main uses and practices that governments perform to date in relation to AI.
In addition, from the perspective of collective behavior analysis, the present study theoretically discusses the contributions identified as research topics, which can be established both as constructs of quantitative models and as research objectives in further exploratory or qualitative research.In this way, the study contributions can be used to establish and design new approaches that use exploratory methods that work with AI to make predictions for this field of research.The emerging development of AI and the analysis of collective behavior linked to the privacy of citizens become relevant issues for the next decade, since AI applications and development will be exponential in this time horizon.
In addition, our findings from the interviews with members of the government allowed us to identify the main concerns related to user privacy and use of AI from the point of view of both governments and citizens.The present study also contributes to the ongoing debates about surveillance capitalism, and how user data can become the core of economic initiatives and stimulations of the global economy.Furthermore, through the development of interviews, we interpreted the informants' points of view about AI and its uses in public institutions, as well as linked the results to the main initiatives and applications of AI developed to date, in relation to both economic impulses and predictions of mass behavior.

Implications for governments
The results of the present study provide several practical implications for government.First, governments can use our findings as a reference to the main uses of AI previously discussed in the literature.Also, based on our analysis of privacy and ethical issues, governments can take into account the future research agenda proposed in the present study in order to avoid possible data breaches, violations of citizens' privacy, or abuses in the handling of their data.Therefore, governments need to be aware of the challenges and risks of using AI and BDS techniques to predict societal behavior.
Furthermore, governments can use the results of the present study to better understand the main applications of AI and how they should both respect the source of data collection and ensure appropriate use of predictive tools that do not violate user privacy.In particular, governments can use the proposed research agenda as a roadmap for the development of AI strategies in their policies and interactions with citizens.The future questions presented in the research agenda should be considered by governments and public agents to regulate the AI industry, avoid the use of unethical actions linked to BDS, and, above all, better understand the concept of surveillance capitalism and how AI actions can violate citizens' human rights.
Finally, governments should deploy regulatory decisions regarding user privacy on the Internet, management of data, and legitimacy of the study of the collective behavior.Likewise, governments can consult the studies reviewed in the present research in order to analyze detailed case studies that report AI preliminary results on behavioral prediction, crime anticipation, prediction of economic and social movements or health alerts.

Limitations
The limitations of the present study are related to the methodological approaches we used.Furthermore, since the object of our investigation is a fast-developing field, some of our conclusions might eventually become outdated.Other limitations include a relatively small number of interviewed informants, as well as the fact that the research was limited to only one country (Spain).Of note, the study includes articles only in English, thus other valid research in different languages may be left out in the review.Also, the data mining approach with LDA is exploratory.Another limitation is that the names of the topics were chosen based on the results of exploratory research.Moreover, algorithms working with machine learning can improve their efficiency through training.Of note, in future studies, it would be necessary to address research questions outlined in the proposed agenda in order to complete and answer the questions identified as priorities in the given research topic.

Conclusions
In the present study, we explored the concept of BDS linked to privacy issues when governments develop strategies using AI.To this end, we performed a systematic review of the literature and collected major academic contributions published to date.In addition, we conducted 15 semi-structured interviews with experts working in public administrations and governments, and two data-mining approaches were used to analyze the collected interview data.Based on the results, we formulated a future agenda for further research in the studied area.
With regard to RQ1 (What kind of citizens' privacy issues are expected when governments use behavioral-based AI in their strategies?),we classified the risks to citizens' privacy according to the types of strategies focused on AI used by governments.These issues were analyzed and incorporated into the proposed future research agenda.Furthermore, concerning RQ2 (What AI techniques can governments develop to predict the society's behavior?),we used our systematic literature review to find out the main uses that governments make of AI.Based on these findings, we also discussed possible applications from the perspective of collective behavior analysis.
In addition, with a particular focus on user privacy, we also defined different perspectives of analysis of user behavior and privacy violations.To this end, we identified several major topics in our data.With regard to the last objective of the present study, we established future guidelines to address challenges to conduct further research on different areas and applications of AI by governments, secure preservation of data, and user privacy.Therefore, our results revealed the main uses of AI by governments and how they, through using AI in their models and algorithms that work with machine learning, focus on indicators to improve the interaction with citizens, organization in cities, services provided or the economy.
Similarly, we discussed the importance that behavior of citizens' knowledge for the success of good governance.The main issues related to the use of AI and the process of population control, and its massive monitoring were critically reviewed.In addition, we also discussed activities that governments perform in states of alarm in relation to the prevention of possible terrorist attacks or cyber-attacks where governments can use AI tools to defend the interests of the country.Additionally, we critically reviewed and discussed the kind of actions and decision-making processes carried out by governments in states of exception.Using these tools, governments can promote the development of AI-based actions-justified based on social impulses-but that could not respect the privacy of citizens.Therefore, favorable economic factors for governments, public institutions, or interested third parties should be promoted.Finally, the role of national intelligence for the analysis of collective behavior and the initiatives that governments can implement to ensure that their actions are legitimate and that the population supports and understands them correctly were also highlighted.
Finally, the development of regulations focused on the ethical design of user/citizen data collection and management strategies is not progressing at the same speed that technology.This elicits serious concerns about privacy of users and abuse of citizens' behavior with BDS techniques.Governments must implement new actions focused on regulating the security, ethics, and privacy of users' data.The risk of modifying citizens' behavior is real, so new legislation and rules must be implemented to regulate the use of citizens' data based on the optimization of models, development of intelligent systems, and actions to optimize industries or services.

Table 1
Main concepts linked to the analysis of behavioral data.

Table 2
Data sources that can be used by the government to deploy AI strategies.

Table 3
Relevant papers found in the literature review.
▪ Investigating how data intelligence improves decision making in the public sector (continued on next page) J.R.Saura et al.

Table 4
Interviewees by role of informant, industry, professional state, organization, and nationality.Business Confederation of Madrid (CEIM) advice the national government of Spain.*League of Arab States is a regional organization that aims to safeguard independence and sovereignty and to consider in a general way the affairs and interests of the Arab countries.*PP (Partido Popular) is a Spanish political party.Currently, it is the opposition party that controls 6 of 19 regional governments in Spain.*PSOE (Partido Socialista Obrero Español) is the Spanish political party in the current government. *

Table 5
Demographic characteristics of the informants.

Table 6
Main uses of AI by governments found in the literature review.

Table 8
Topics identified using LDA.

Table 9
Keywords grouped by relevance.

Table 11
Future research questions on user privacy and AI strategies deployed by governments.