Observing political and societal changes in Finnish parliamentary speech data, 1980–2010, with topic modelling

ABSTRACT Parliamentary speech reflects many events, changes and developments in society, as well as shaping them by influencing legislation and public interest. Knowing what topics have been dominant in parliamentary discussions can reveal what has been considered important at the time the speech was given. This knowledge can be achieved computationally with topic modelling, which can identify latent topics in large numbers of texts. Currently, the method is still underused in parliamentary studies and has only previously been used once with Finnish parliamentary speeches. This article aims to create and validate a topic model offering a robust overview of Finnish parliamentary speeches from 1980 to 2010, and to demonstrate the validity of the model by examining peaks in topic occurrences and comparing them to the historical and societal context at the times. The topics ‘energy’, ‘employment’ and ‘democracy’ were selected for closer inspection.


Introduction
Topics discussed in parliament reflect both what is considered important in society at the time and how they affect societal discussions. The occurrence of these topics can vary over time and between political parties. Some topics, such as budget debates, government formation and interpellations, emerge periodically and form a kind of parliamentary calendar. Other topics may vary due to historical events and other factors. This article studies changes in topics as a way to observe the development of parliamentary interest during times of those major societal changes that have made Finland increasingly international, for example, the rise and explosive expansion of the Internet, and Finland joining the European Union. The model will be used as a basis for further analysis where compact size, easy interpretability and general robustness are the preferred qualities.
Topic modelling is a method of unsupervised machine learning, a statistical tool for discovering abstract topics in a collection of texts. It takes in a set of texts, uses a probabilistic framework to infer relationships between words and assigns words to certain topics, then assigns these topics to the texts. The topics defined by a topic model are abstract, which means that the researcher must give meaningful labels to the topics afterwards, as a model cannot understand the text it has processed. The topics are also often broader and not as fine-grained as topics defined by human readers. The power of topic modelling becomes apparent when the number of texts is so large that it is impossible for a human reader to process everything in a reasonable amount of time. Topic modelling can process millions of words in mere hours.
Studies of parliamentary speeches can reveal the impact they have on our society and help monitor the use of power, consequently there have been many studies from various perspectives. Numerous studies have examined the occurrence and aspects of a single theme or topic, such as illegal drugs, higher education, or the war against ISIS. Topic modelling is a fairly recent development in the context of studying parliamentary speeches. Most international studies have used topic modelling with parliamentary data in English 1 but studies in other languages are also emerging as the digitized materials become available. 2 Topic modelling has only been used previously on Finnish parliamentary speech by Loukasmäki & Makkonen 3 who analyzed how the topics of the government and opposition differ from each other. Other types of quantitative studies on Finnish parliamentary speech have been conducted by Elo, 4 Kettunen & la Mela, 5 and Simola. 6 This scarcity is explained by the fact that Finnish parliamentary speech data has not been previously available in a computer-readable format.
The Finnish parliament, the Eduskunta, is a public arena where individual MPs can freely access the floor. Although factions candepending on the saliency of the issue, limited debating time, or the expertise required for a proper contributionexercise some level of control, parliamentary factions generally respect their members' right to speak. 7 A major parliamentary reform in Finland in the 1990s sought to elevate the plenary session to one of the most important forums for national politics and debates on topical political matters. 8 Since then debates on topical matters have been able to be held at short notice, and reports and information the government is constitutionally obliged to provide to the Eduskunta have been debated in the plenary. 9 Since our data covers a period consisting of both pre-and post-reform plenary debates it is interesting to see whether the reform has somehow affected the topic structure.
Although we do not deny the impact and influence of the parliamentary culture and rules on plenary debates, our primary interest is in the 'politics vocabulary' used in parliamentary debates to discuss topical matters. Here we follow Palonen's ideas about 'politicization' as a speech act marking a change or even a break in relation to politics and opening up a spectrum of opportunities. The core point here is that no phenomenon can be labelled as political by nature, they become political once someone politicizes, i.e. interprets them as political. 10 Departing from this point we understand plenary debates as speech acts of politicization: taking up a question or matter in a plenary debate politicizes it and opens it for opportunities.
In this article, we seek to answer the following research question: Is LDA (latent Dirichlet allocation) based topic modelling a reliable research tool to explore themes and dynamics of plenary debates? From this perspective this article paper has two main objectives. First, it seeks to provide evidence that LDA is capable of producing meaningful and reliable results also when applied to non-English textual corpora. With meaningful we mean a topic structure, in which a majority of the topic labels can be easily determined by the most influential content words. With reliable we mean that most topics can be explained by the main political, societal, and historical developments. Here we lean heavily on a recent study of Ahonen and Koljonen 11 exploring transitory and resilient topics in Finnish party manifestos by applying LDA topic modelling.
The article proceeds in three steps. In the first section, we discuss our data and the workflow of data pre-processing, as well as describe general questions related to LDA topic modelling and the model validation. The second section is dedicated to the presentation of our empirical results. This section starts with the presentation of our LDA model, followed by a critical assessment and validation of the model. The latter part of this section presents empirical evidence for the reliability and validity of the model by critically reflecting selected topics against the context of Finnish political history. The article is summarized with concluding remarks focusing on the main findings in a theoretical, methodological, and empirical sense.

Data and methods
Dataset and pre-processing The Finnish parliamentary speech dataset used in this study has been digitized and made computer readable in the Semantic Parliament project (SEMPARL). 12 The full dataset covers all transcribed speeches given in plenary sessions in the Finnish parliament (years 1907-2021 in full at the time of writing) containing almost 1.2 million speeches and over 200 million words. The 1980-2010 subset of speeches modelled in this paper consisted of around 476,000 speeches and 90 million words but were reduced to around 454,000 speeches and 25 million words during pre-processing. The pre-processing consisted of seven main steps: lemmatization and part-of-speech tagging of all words; removing Swedish language sentences; removing parts of speech other than nouns, verbs and adjectives; removing speeches by the Speaker of Parliament; removing speeches shorter than 5 words; removing words with no letters; removing words with corpus frequency less than 5 or over 57,000. Three additional steps were used to create four slightly differing wordsets or dictionaries, referred to later as d1-d4. These additional steps were: removing words with document frequency of less than 5 or over 57,000; allowing some exception words to remain; removing some additional stop-words common to parliamentary speech.
The main pre-processing steps were based on known best practices. Though the stemming of words has been noted not to improve the results of topic models, 13 lemmatizing gives a more accurate image of speech 14 especially in highly inflectional languages such as Finnish (for example. a single Finnish noun can theoretically have around 2,200 different inflectional forms). Lemmatizing also decreases the list of words used by the model (model dictionary), the model size and the runtime. In this study the speeches were lemmatized as well as annotated with part-of-speech and dependency relation tags using Turku Neural Parser which works extremely well with the Finnish language. 15 Including only nouns, verbs and adjectives in the model was for the sake of clarity and time constraints. Finland has two official languages, Finnish and Swedish, but since multilingual topic models are still under development, 16 only Finnish sentences were added to the final corpus. Languages were automatically determined with Python's langdetect library, which is around 93% accurate. 17 Speeches from the Speaker of Parliament were excluded, since most of them can be considered uninformative. Words that did not contain any letters were also left out. Topic models are generally not good with very short texts, so all speeches shorter than five words were removed. Words that occur rarely or extremely often do not have much impact in the model and can be removed to compactify the model. 18 The recommended way to conduct vocabulary pruning is to remove words with a document frequency under 0.5% or over 99%, 19 but our method was a little less strict: words with a corpus frequency (how many times a word occurs in the entire collection of documents) of less than five or more than 57,000 were removed.

Topic model: latent Dirichlet allocation (LDA)
Several different kinds of topic models exist and more are continuously being developed. The most famous and widely used model is latent Dirichlet allocation (LDA) 20 which was also used in this study. The version used here was from a/the Python library Gensim 21 which uses an online variational Bayes algorithm. 22 In practice, a corpus of documents is input into the LDA programme and the result is a model where every document is represented by a distribution of topics, and every topic is represented by a distribution of words.
LDA learns the document-topic and topic-word distributions in an unsupervised manner, but these distributions are guided by two hyper-parameters that can be optimized; in Gensim these are called alpha (for document-topic distribution) and eta (for topic-word distribution). The third and perhaps the most important parameter that needs to be optimized is the number of topics that LDA will create, usually represented by K. Gensim can learn good alpha and eta values automatically from the data, but K always needs to be given. Parameter optimization requires creating several different LDA models and comparing them. Altogether around 200 models were created for this purpose and 12 were chosen for closer inspection.
Validation: coherence scores, close reading and classification tasks The 12 candidate models were chosen based on coherence scores and the best model was chosen based on topic interpretability. Finally, to make certain that the chosen LDA model was adequate, it was validated with two classification tasks.
The most popular method for computational evaluation of topic models used to be perplexity but since it was found not to correlate with topic interpretability 23 it has been replaced by coherence. Newman et al. 24 define topic coherence simply as semantic relatedness, but it can be understood as words appearing in similar contexts. Coherence scores are calculated for individual topics and their average can represent the coherence of an entire topic model. Coherence intrinsically favours a smaller number of large topics over a large number of small topics, 25 which suits the aim of acquiring relatively broad topics and a compact model.
According to Gensim's documentation, the formulas and pipeline used in Gensim's coherence model are from Röder et al. 26 Gensim can calculate four different coherence scores: C v , C uci , C npmi and C Umass . As explained by Röder, each coherence score is calculated in four steps: segmentation, probability estimation, confirmation measure, and aggregation. 27 Segmentation defines which parts of the texts are compared to each other: C uci , C npmi and C Umass segment and compare single word pairs, C v compares single words to the entire word set using word context vectors. Probability estimation estimates how probable it is for the words or word-sets to co-occur: C Umass simply calculates the number of documents where the word occurs and divides it by the total number of documents (Boolean document method); C uci , C npmi and C v use the Boolean sliding window method where a window of n words moves across the entire text one word at a time and creates n-word pseudo documents, for which the probability is calculated a similar way to full documents. The confirmation measure uses probabilities to compute semantic support for the word pairs and sets. There are two kinds of confirmation measures: direct and indirect. The benefit of the indirect measure is that it detects semantic support between words that do not often occur together (for example, 'skiing' and 'cycling'), but do co-occur with similar word-sets (for example, {'sports', 'equipment', 'speed'}). Direct measures are used in C uci (log-ratio), C npmi (normalized log-ratio) and C Umass (log-conditional) while C v uses an indirect measure (cosine similarity + normalized pointwise mutual information). Aggregation simply combines the confirmation measures into a single coherence score; all scores use arithmetic mean to do this.
C v coherence has been found to best correlate with human ratings of topics. 28 Because C v was also used by Loukasmäki and Makkonen 29 as well as many others 30 it was chosen as the main scoring formula to be used in this paper (Gensim's default window size of 110 was used). However, the original implementation of C v has been contested 31 so the other three coherence scores were used to complement the results of C v .
The highest C v coherence scores usually represent the best models but, as was mentioned previoulsy, coherence tends to favour small numbers of topics. The dataset used in this study is quite large and covers a long period of time, so statistically one would expect to find many topics. However, very fine-grained topics would be unsuitable for the purposes of obtaining an overview of the data and a complex model would make further use challenging. In addition, models with high numbers of topics usually include many split and low-quality topics which would make diachronic changes very difficult to interpret.
Human evaluation is still considered the most reliable method of whether or not topics make sense 32 and it sometimes differs from the results of automated metrics such as coherence. 33 Thus, coherence alone could not be used to choose the best model, only to select the candidate models. Each topic in each candidate model was then ranked in quality by close reading the 25 most probable words for each topic. Three simple ranking categories were used: A topic was labelled 'mixed' if it contained words clearly belonging to two or more topics, and if two or more topics could clearly be combined into one the topics were labelled as 'split'. The label 'junk' was used if no unifying theme could be found or if the topic was considered too general (for example, a topic about times and time changes).
The model that proportionally had the most 'good' topics was chosen for two classification tests. Loukasmäki & Makkonen 34 partly used the same data as this study and they reported minor differences in topic choices between government and opposition speakers when examining simple sums of topic probabilities in all speeches. Following this, the first test consisted of classifying the speeches accordingly to whether they were government or opposition based as regards the topic distributions. The second test concerned letting a classifier predict topics for speeches from their vector representations. This time the best classified topics were compared to the topics with the highest coherence scores. The idea was that if the topics were of good quality and coherent, they should also be easier to classify.
For the first classification test an XGBoost 35 model was trained with speech topic vectors. XGBoost is a gradient-boosted decision tree (GBDT) machine learning library and is currently one of the most powerful and most used machine learning algorithms. It makes its predictions by making many small and weak tree models and combining their results. The version used in this study was the XGBoost API with Python's scikitlearn interface. 36 XGBoost requires considerable parameter tuning, which is time consuming with large models, therefore, for the second test a simpler classification tool was used, namely Random Forest 37 also from Python's scikit-learn.

LDA model and validation
The initial tests (shown in Figure 1) showed that models within the range of 20-60 topics had relatively high coherence scores. A point of interest was also found around 140 topics where C v, C npmi and C uci made a small peak in the graph, but the number of topics was deemed too high for the purposes of this study. We then created 180 models with 15-60 topics using the four dictionaries. Twelve candidate models were selected based on peak coherence scores and their topics were interpreted and labelled as either good, mixed, split or junk. Candidate model details are listed in Table 1. A model with 26 topics made with dictionary d4 (t26d4) was chosen for further analysis as it proportionally contained the highest number of good quality topics (81%): only topics 'social benefits' and  'social problems' were considered as two sides of one comprehensive topic 'social care', thus labelled as split, and two other topics were considered as junk but interpreted as 'parliamentary factions' and 'general'. The top 10 topic word lists and interpretations of model t26d4 are presented in Appendix 1.
It should be noted that all candidate modelswhich had the numbers of topics very close to each otherhad quite similar topics, just organized and mixed differently. All candidate models had topics that could be interpreted in the following categorisations: 'taxation', 'legislation', 'agriculture', 'budget', 'education', 'employment', 'energy' and 'social and health care'. Some models included some interesting topics that were not present in the chosen model t26d4. For example, several models had topics that could be interpreted as 'value judgement' which would be an interesting point of study on its own.
A classical dividing line in parliamentary debates has often been that between the governmental parties and the parliamentary opposition. Here the underlying assumption is that the government vs. opposition polarization in plenary debates reflects the overall party polarization, so that in countries with a strong party polarization the government vs. opposition will also be stronger. 38 As the Finnish parliamentary debating culture is rather cautious and polite, 39 we expected the debates to reveal no strong polarization between the government and opposition.
Two classification tasks were then carried out in order to validate the chosen topic model. First, document topic vectors were fed into an XGBoost classifier to see if it could differentiate between speeches by the government or the opposition. Speeches with unknown government/opposition status (3.2%) were removed from the data before classification. The resulting classification accuracy for t26d7 with speech topic distributions as data was 0.62, which is somewhat low but can be explained by the long time span which results in ample variation. Usually a low number of topics results in a lower resolution, subsequently the comparison models t40d7 and t140d5, which also had decent coherence scores, were classified in a similar manner to see if a higher number of topics would increase the accuracy. Initially, t40d7 was classified with XGBoost which produced an 0.62 accuracy, which was the same as the chosen 26-topic model. Then all three models were classified with Random Forest, which produced an accuracy score of 0.61 for each model. The consistency suggests that the constricted number of topics should not be an issue in this case. Moreover, the topic differences between the government and opposition observed in Loukasmäki & Makkonen 40 were not very remarkable, so highly accurate classification could not be expected. More detailed results of the government/opposition XGBoost classifications can be seen in Table 2.
Next, an attempt was made to predict the topics of speeches from the vector representation of the corpus, in which the vocabulary was already pruned and represented as numbers. A Random Forest classifier was used and resulted in an 0.68 accuracy. Some topics were nearly always predicted (f1 score over 0.95): 'voting' 'question time', 'law proposals' and 'social and health care'. These topics probably consist of vocabulary that are not present in other topics. Other well predicted topics (f1 score 0.67-0.76) were 'general', 'democracy', 'development cooperation' and 'traffic and transport'. The classifier had problems with the remaining topics. The full classification report is presented in Table 3.
Changes over time were best observed with yearly topic occurrence graphs, which indicate how present each topic was each year. Two slightly different graphs were created for each topic: one illustrating the proportion of speeches where the topic was the most dominant, i.e. the speech was mostly about the topic in question but many other topics could be almost equally present; the other mapping speeches where, in addition to being the most dominant, the topic was at least 30% present or more, i.e. the topic was clearly discussed in the speech. In LDA the document-topic distribution is controlled by the α hyperparameter, which was in this case automatically adjusted. The chosen LDA model assigned several topics to each speech, which meant that within one speech no topic could be present with a very high percentage. Consequently, some speeches did not have a single topic that was 30% present or more. Using this topic presence threshold resulted in less speeches registered in the yearly topic graphs compared to using no threshold. All topic occurrence graphs with and without the threshold can be seen in Appendix 2.
The 'general' topic was the most dominant topic overall (around 30%-55% each year), which was expected. The second most dominant topic overall was 'public sector' (12%-27%), and the third was 'law proposal' (2%-12%). All other overall topic occurrences remained under 10% each year. Several topics were present less than 1% each year.

Embedding LDA topics to the Finnish contemporary history
Generally speaking the topics identified by LDA were, as the human-decided labels indicate, quite unambiguous and politically meaningful. Further, for most topics it is rather easy to establish connections between the topic and its supposed context. From this perspective, the results from the topic modelling give support to the assumption that topics identified by distant reading are sufficiently reliable as regards describing the main themes of the plenary debates in the Finnish Parliament. Here we strongly rely on the common understanding of plenary speeches as a communicative act aiming at the strengthening of the connection between MPs and citizens, but also as a struggle between different political opinions. 41 In order to better assess the quality of topic modelling we selected three (3) topics for a closer, qualitative analysis. We used the following three (3) criteria for the topic selection: 1. The occurrence of the topic showed a significant variety with clear peaks, thus capturing the ebb and flow typical for parliamentary debates connected to topical issues. In other words, themes come and go in normal society and this is reflected by plenary debates focusing on topical questions. 2. Words with the highest probabilities can be assigned to a certain theme. This means that words used to discuss the topic in plenary sessions can be contextualized. 3. The themes should be relevant for the contemporary history of Finnish politics.
Strictly speaking, this is not a computational, but an interpretational criterion. Here the idea was to evidence the usefulness of computational methods for the study of Finnish contemporary history by selecting specific topics considered central to Finnish politics between 1980 and 2010.
The topics selected based on these criteria are topic #2 labelled 'energy', topic #7 labelled 'employment', and topic #17 labelled 'democracy'. There were many other topics fitting the above-defined criteria, but we considered these topics well-suited to exemplifying the power of topic modelling with regard to capturing the dynamics of intensive political debates. The qualitative analysis of each topic follows the same two steps. First, we describe and assess the core vocabulary of the topic by considering it from the perspective of the wider political context. This step is important for demonstrating how well the model captures the main content of plenary debates on the topic in question. Second, we discuss the variety over time by paying strong attention to the peaks. Compared to the step 1, this step focuses on whether the variety identified by LDA is true in the sense that it corresponds to the real intensity of political discourses on the topic. We focus on the peaks on the right-hand side of the occurrence diagrams (Appendix 2) illustrating the annual share of speeches where each speech has at least 30% of its contents allocated to the topic in question. Since many plenary speeches address several issues, we considered this threshold to be the most appropriate.

Energy
The core vocabulary of ten (10) most used words of topic #2, 'energy' (see Appendix 1), tackles a wide range of issues not only related to energy production (wood, electricity, nuclear power), but also to issues associated with, for example, climate change (know-how, renewable, emission). Thus, plenary debates on energy questions seem to reflect a re-contextualization of energy debates from purely technical questions to a context framed by global climate change. 42  (1) 'Let's put clear adverse taxes on coal and heavy fuel oil. [So …] we will also commit ourselves to the Rio Environment Agreement in a genuine way.
[…] It is quite clear that once it is signed, we will have to make political actions that show that Finland is genuinely committed to it. A basic energy solution based on coal will certainly destroy Finland's reputation as a signatory to the Rio Environment Agreement'.
In a similar manner, the 'Kyoto Protocol' 44  (2) 'Restoring emissions to year [19]90 levels cannot be achieved without nuclear power construction, unless the use of coal and peat is completely replaced by natural gas and wood. This is probably not realistic in the time frame set'.

Employment
The second topic, topic #7 with the label 'employment', is characterized by words typical for a traditional speech on labour force policy. The 10 most important words consists of both labour policy related actors -'työn#tekijä' (employee), 'työn#antaja' (employer), 'työtön' (unemployed), 'nuori' (young person), or 'työ#voima' (workforce)and structural and/or societal concepts, i.e. 'työ#paikka' (workplace), 'työ#elämä' (work), 'työttömyys' (unemployment), 'työllisyys' (employment), and 'palkka' (pay/salary). Such vocabulary is rather typical and common in labour policy related discourses. Furthermore, the very traditional setting of the core vocabulary might also indicate that the main content of plenary debates on labour policy has remained rather stable and focused on very traditional questions and challenges on labour force policy. This is quite an interesting observation, especially considering the profound changes the Finnish economy has undergone from 1980 to 2010, including integration to the global economy, the deep recession in the early 1990s, joining the EU in 1995 and the single currency euro in 2002. 47 Compared to energy-related plenary debates, the timeline of employment-related debates is rather different with one clear peak in 1995-1996. This peak is a distinct aftermath of the great depression in Finland in the early 1990s. The depression was at least partly caused by the extensive de-regulation of the Finnish financial system in the late 1980s and the collapse of Soviet trade in 1991. 48 During the period from 1992 to 1997 Finland suffered from extremely high unemployment with an average unemployment rate of approx. 17%. The rapid economic collapse demanded extraordinary decisions by the government, 49 resulting in heated political clashes between the government and opposition in plenary debates. A good example is the speech of MP Esa Lahtela (5) (Social Democrats, government) on 26 April 1995, in which Lahtela demanded clearer facts from the opposition parties about the savings the opposition were insisting on by framing these cuts as an attack on the welfare state: Overall, considering the centrality of labour politics in debates between the government and opposition, it is no great surprise that the occurrence correlates with the unemployment rate. These concerns were also not only exacerbated by increasing social segregation, a growing share of underprivileged people or social groups being threatened with poverty, but also by government-led crisis management strengthening the role of the executive. 50 These concerns actually confirm findings from previous studies stressing the correlation between the personal economic situation of citizens and their overall satisfaction with democracy or the political system in general. 51

Democracy
The last topic, topic #17 on 'democracy' revolves heavily around constitutional concepts -'perustus#laki' (constitution), 'perustus#laki#valio#kunta' (Constitutional Law Committee, often used together with 'jäsen' (member)), 'tasa#valta' (republic), 'demokratia' (democracy), 'valta' (power)and democratic institutions -'presidentti' (president), 'valtio#neuvosto' (the [Finnish] Government), 'vaalit' (elections). When considered from the perspective of the temporal context each of the peaks of 1987, 1994, and 1999 can be relatively easily explained by contextual factors. The elections in 1987 marked a certain turning point in the Finnish political history; this was the first year after the long reign of President Urho Kekkonen and especially the conservatives (6) advocated for a reform of the presidential election system that was based on a direct popular election: The National Coalition Party pushes for a direct, if necessary, two-stage popular election of the President of the Republic. Such an electoral system would be the most democratic and at the same time the clearest-cut option for the electorate.
The National Coalition Party represented by MP Zyskowicz returned to governmental power in 1987 by forming a pathbreaking coalition government between the conservatives and social democrats under PM Harri Holkeri. Since this coalition was considered unexpected, as a non-socialist coalition was rated more probable, the intervening role of the social democratic President Mauno Koivisto was intensively debated and he was criticized for overstretching his democratic powers. The presidential election reform was at least partly connected to this particular government formation, but also illustrated a longer political development away from a presidential system to a parliamentary democracy.
Plenary debates on democracy during the second peak in 1994 revolved around the approaching referendum on Finland's EU membership. Since it was only the second referendum in Finland after 1917, this questions quite understandably resulted in politically heated discussions; it was also used to politicize questions of representative and direct democracy in Finland, as the speech of MP Heidi Hautala (7) (Green League) on 17 May 1994 exemplifies well: (7) 'I propose that Finland should, in addition to consultative referendums, start organizing binding referenda as well.
[…] Direct democracy is very undeveloped in Finland. The public, after all, has extremely little political influence. Our constitution is highly representative'.
The last peak in 1999 is clearly connected to constitutional reform in Finland. The constitutional reform process was started in 1995, with the expert committee completing its work in 1997, and the Constitutional Law Committee of the Finnish eduskunta considering the report in 1998. The Constitutional Law Committee submitted its unanimous report on the bill to parliament in January 1999, followed by a plenary hearing in February 1999 by the outgoing parliament and the final approval by the newly elected parliament in June 1999. The new constitution entered into force on 1 March 2000. Interestingly, debates during this peak continuedas the speech of MP Johannes Leppänen (8) (Centre Party) on 1 June 1999 exemplifiesthe earlier plenary debates on presidential elections in 1987 and on the role of referenda and direct democracy in Finland: (8) 'The basis of our Constitution is representative democracy.
[…] New, timely ways to participate must be developed in order to support the representative democracy. Without a functioning civil society, no democracy will prosper'.
Overall, the empirical analysis of the three selected topics establishes evidence for the reliability of the LDA results. Results from the content analysis of randomly selected original plenary speeches allocated to the topics confirm that the labels used to describe the topics' focus are well in line with the true content of speeches. Further, the model also quite reliably captures the main debates over time. The latter is an important observation, as the model should work with highly volatile and dynamic political discussions. Interestingly, the LDA results do not indicate any significant change possibly related to the major constitutional reform in Finland in 1990s. We do not question the impact of this reform on the role of plenary debates in national politics, however, our results do not indicate any significant change regarding the intensity of plenary debates. In other words, even prior to the reform MPs seem to have considered plenary debates as one forum in which to politicize topical matters.
Further, the empirical results offer support for the interpretation that the model used in this paper is able to identify core vocabularies that are robust over time and to use these as a basis for topic allocation. At the same time, the model is not 'misguided' by context-related vocabularies as the core vocabulary is deeply embedded and, thus, can help the researcher to capture discursive dynamics as well. As we have shown above, the variation of topic occurrence reported as a part of the modelling results is a valuable help to discerning connections between a topic and the underlying political, societal, and historical developments.

Conclusions
In this article we examined the applicability of LDA topic modelling on large parliamentary corpora consisting of three decades of plenary debates  in the Finnish Eduskunta. The article's main standpoint is rooted in Kari Palonen's concept of politicization as a speech act to label a question or matter as political, so that it becomes possible to break up with present politics and open up new opportunities. Departing from this understanding our article also considers topics discussed in plenary debates as those having been politicized by MPs, thus reflecting the changing landscape of topical matters.
Our results are significant in three respect. First, the LDA model used to identify topics and their dynamics over time appears to produce reliable, coherent, and analytically meaningful results. It is necessary here to note that the time span is rather long and therefore also subject to natural changes in vocabulary as well as the use of language. The best model of 120 models created was chosen by comparing coherence scores and the interpretability of topics. Since the dataset was new, there was no generally accepted baseline with which to compare the model and because coherence scores depend on context 52 comparing coherence scores from studies using different datasets was not feasible. Thus, the model was validated by checking that the resulting topic distributions could function as a basis for classification. The results of the topic classification task verified that the topics were reliable and discrete. Although this might be considered a somewhat technical justification, finding a basis for appropriate and meaningful labels fulfils the important task of establishing linkages between the vocabulary of politics used in plenary debates and topic-driver discourses. The fact that such connections could be established based on the results of our LDA topic modelling provides at least one important piece of evidence for the applicability of computational methods on parliamentary speeches.
Second, the relatively low accuracy of the government/opposition classification task is a good reflection of the nature of political discourse in Finland, since compared to many other countries, Finland has relatively low levels of political polarization. This can be attributed to a number of factors, including a strong tradition of consensus-based decision making, a relatively homogeneous population, and a multi-party system that encourages cooperation and compromise. In other words, our findings are not only in line with those from previous non-computational studies focusing on Finnish parliamentary debates, but can also be satisfactorily explained by theories on the Finnish parliamentary system.
Third, the empirical, qualitative analysis of the three selected topicsenergy, employment, and democracygive support to the reliability of the results from LDA modelling. The study shows that topic modelling is a valid tool for finding themes in political discussions and also examining the timelines of the appearances of these themes over a long period of time. Even though the peaks of the topics discussed might already be familiar and well-known to the general populace does not nullify our results. On the contrary, the results presented in this paper are encouraging as they indicate that topic modelling is a reliable method to explore large parliamentary document corpora in order to gain understanding of the ebb and flow of political themes in parliamentary debates. Moreover, LDA seem to be a powerful tool for identifying which topics have been topical in a certain period of time, and also for gaining an understanding of the dynamics of politics vocabulary over time. In other words, LDA seems to be a reliable tool to analyse what topics have been politicized and at what period of time.
Although the results from our unsupervised, computational analysis of plenary debates are encouraging in many respects, we consider it methodologically problematic to completely rely on results produced by computational methods. As an alternative, we encourage and stress the indispensable importance of subject-related knowledge for a proper empirical assessment of the results of a computational, unsupervised analysis. A fruitful dialogue between computational methods, empirical analysis, and political theories is in our opinion the best combination when utilizing the analytical power of computational social sciences in the field of parliamentary studies.

Notes on contributors
Anna Ristilä is a PhD student in Digital Linguistics at the School of Languages and Translation Studies at the University of Turku (UTU). Her dissertation focuses on Finnish parliamentary speeches and explores the underlying topic landscape and its changes over time.
Kimmo Elo, Adjunct Professor, is a Senior Researcher at the Centre for Parliamentary Studies at the University of Turku (UTU). His research interests include German politics and history since 1945, theories and politics of European integration, Cold War and post Cold War intelligence, digital parliament studies, theories and methods of network analysis and computational social sciences, as well as knowledge visualization techniques.