Mapping the Dutch Energy Transition Hyperlink Network

: The Internet facilitates connections between a range of Dutch actors with a stake in the energy transition, including governments, environmental organizations, media outlets and corporations. These connections tease a hyperlink network a ﬀ ecting public access to information on energy transition issues. Despite its societal relevance, however, the characteristics of this network remain understudied. The main goals of this study are to shed some light on the topological characteristics of the Dutch energy transition hyperlink network and reveal the main topics discussed in the network. To do so, we longitudinally collected data from the interactions between key Dutch actors with a stake in the energy transition. Then, these data were analyzed by employing a mixed-method approach, social network analysis and topic modeling. The results of the social network analyses reveal the existence of a sparse network in which few private companies and associations emerge as the most authoritative actors and brokers. Furthermore, our analyses show substantial di ﬀ erences among the communication agendas of the organizations of the Dutch energy transition hyperlink network; while public institutions focus on global, national and local policy issues, private companies, associations and NGOs pay much more attention to employment issues. discourse? Moreover, (b) what are the dominant topics emerging in the energy online


Introduction
In The Netherlands, the energy transition ("energietransitie") is mobilizing a wide variety of societal actors (public institutions, private companies, NGOs, universities and governmental bodies, among others) with a growing impetus-not even faltered during the recent pandemic crisis-in greening the energy sources. The path toward the energy transition, whose scope will arguably encompass the whole European Union if the recently proclaimed "Green New Deal" is successful, will trigger enormous changes in the energy sector [1] and in the economic and social spheres. Past energy transitions, such as those from biomass to coal or from coal to oil, deeply transformed our societies and economies. However, in comparison to the present energy transition, those transitions were merely technology led and occurred spontaneously, as a result of unleashing the transformative power encoded in the new forms of energy, sources and processes [2]. Today, for the first time in human history, these changes have become the goal, rather than the result of the onset of new "energy imperatives" [3]. The investment in energy production facilities and the introduction of low-carbon policies have made renewable sources increasing competitive compared to fossil fuels [1,4], generating a positive impact on many power markets across Europe. A wave of change is reaching, for instance, the automotive of the underlying relational actors' network and to infer knowledge on the actors' characteristics and roles [42,[46][47][48][49][50][51].
Web-based tools are usually applied to track and collect hyperlinks between websites for the construction of hyperlink networks [48,49,52]. Despite the advantages in terms of volume and accessibility of data, these data collection techniques present some limitations. One of these limitations is that the crawling parameters used by these search engines are undisclosed and researchers have no control over the hyperlink retrieval process [35]. Another limitation is that the analysis of the hyperlink network structure can only provide quantitative evidence of the communication processes, but it does not say anything about the content of such communications [53]. To overcome these limitations, this research combines a network approach with computational text analysis techniques, providing both information about the communication structure and the content of the communications.
Our methodology follows the behavior of the Internet users and it is anchored in the natural observation of the digital communication environment. More specifically, the hyperlink network is addressed as an object of study, as well as a source of data and a hermeneutical device. In so doing, we follow Rogers' digital methods approach [54,55] which uses the epistemology of the Internet-its online groundedness-as methodological basis "in an effort to conceptualize research which follows the medium, captures its dynamics and makes grounded claims about cultural and societal change" [54] (p. 8).

Topic Modeling
Is it possible to extend the network approach to unveil the characteristics of the energy transition discourse in the sphere of web-communication? Many contributions from linguistics explored the relationships between concepts or meanings in a network [56][57][58], whereby each term or concept is represented by a node and the co-occurrence or logical proximity in a text acts as an edge. This method enables observing the extant semantic or logical affinity between nodes that are the underpinning clusters of concepts, characterized by a shared semantic background [59,60]. In this research, this technique is applied to unveil the topics involved in the energy transition process and reveal possible relations between topics in the actors' discourse about the energy transition online.
The term topic model refers to a class of text analysis methods designed to encode the content of a text corpus. The core assumption of this methodology is that the meanings are relational ontologies, which depend on other linguistic signs to determine their nature [61]. Thus, a topic can be conceived as a recurring pattern of terms or a cluster of words characterized by high mutual co-occurrence. Unlike in traditional manual text encoding, in the automated encoding the topic generation process builds on probabilistic models to estimate the optimal number of topics and the distribution of these topics in the corpus.
In the last decade, machine-assisted text analysis tools have demonstrated to be an effective method to investigate topic probabilities distribution in the digital environment [62][63][64][65]. Text analysis on big data poses, on one hand, methodological challenges because each author of the documents refers to his representation of the phenomenon-the energy transition in this case-using his own vocabulary and writing style. On the other hand, as the textual data analyzed increases, the impact of the contributions of individual authors on the analysis gradually decreases. This reduces the uncertainties arising from stylistic and lexical variability, thus increasing the accuracy of the analysis. The computational approach to text analysis enables large amounts of data to be processed in very short timescales, enhancing the possibilities of studying socioeconomic phenomena on a large scale [66][67][68]. Properly employed, topic modeling provides time-efficient reading effort of the text and high substantive interpretability of the latent topics [69], while at the same time ensuring a lower impact of the biases derived from human coding. Thus, for the purpose of our research, we used the structural topic model (STM), a recent extension of the latent Dirichlet allocation (LDA) model that we explain next.

Materials and Methods
The exploration of the energy transition related contents in hyperlink networks poses many methodological challenges, especially concerning the retrieval of the data. Finding a representative sample of actors involved in this dynamic and rapidly expanding online communication process implies having access to a broad data source and the ability to distinguish between the actors that are involved in the energy transition process from those of the energy sector or adjacent markets. To meet these challenges, we adopted a mixed-method approach by combining social network analysis and computational textual analysis techniques in a three-step procedure (Figure 1).

Materials and Methods
The exploration of the energy transition related contents in hyperlink networks poses many methodological challenges, especially concerning the retrieval of the data. Finding a representative sample of actors involved in this dynamic and rapidly expanding online communication process implies having access to a broad data source and the ability to distinguish between the actors that are involved in the energy transition process from those of the energy sector or adjacent markets. To meet these challenges, we adopted a mixed-method approach by combining social network analysis and computational textual analysis techniques in a three-step procedure (Figure 1).

Figure 1.
Procedure designed to investigate the Dutch energy transition online debate. It revolves around three phases: (a) the data gathering phase, which includes, a snowball sampling of all websites connected to the seeds and the extraction of texts concerning the energy transition made by the sampled websites; (b) the data cleaning and preprocessing phase, aimed at filtering and preparing the data; (c) the data analysis phase, which combines social network analysis techniques and computational text analysis techniques.
To speed up the data collection process and make our methodology scalable to large amounts of data, we primarily used unsupervised or semi-supervised tools (see next section), both for textual contents and websites and hyperlink information.

Phase 1: Data Gathering
To collect data, we adopted two different sampling strategies; one aimed at identifying the Dutch actors and their links, and the other focused on collecting the content related to the energy transition created by these actors. To sample the actors, we used two sampling strategies: First, we employed a purposive sampling, also referred to as judgmental sampling or expert sampling. The main objective of a purposive sampling is to produce a sample based on expert knowledge of a population, which is selected in a non-random manner. By using this sampling strategy, we drew on the expertise of the project members to retrieve 14 seeds (see Table 1), covering a wide variety of key civil society actors with a stake in the Dutch energy transition. Procedure designed to investigate the Dutch energy transition online debate. It revolves around three phases: (a) the data gathering phase, which includes, a snowball sampling of all websites connected to the seeds and the extraction of texts concerning the energy transition made by the sampled websites; (b) the data cleaning and preprocessing phase, aimed at filtering and preparing the data; (c) the data analysis phase, which combines social network analysis techniques and computational text analysis techniques.
To speed up the data collection process and make our methodology scalable to large amounts of data, we primarily used unsupervised or semi-supervised tools (see next section), both for textual contents and websites and hyperlink information.

Phase 1: Data Gathering
To collect data, we adopted two different sampling strategies; one aimed at identifying the Dutch actors and their links, and the other focused on collecting the content related to the energy transition created by these actors. To sample the actors, we used two sampling strategies: First, we employed a purposive sampling, also referred to as judgmental sampling or expert sampling. The main objective of a purposive sampling is to produce a sample based on expert knowledge of a population, which is selected in a non-random manner. By using this sampling strategy, we drew on the expertise of the project members to retrieve 14 seeds (see Table 1), covering a wide variety of key civil society actors with a stake in the Dutch energy transition. We decided to use a purposive sampling because as Battaglia states this "is generally considered most appropriate for the selection of small samples often from a limited geographic area or from a restricted population definition, where inference to the population is not the highest priority" [70] (p. 525). Indeed, the relevance of these 14 seeds in the Dutch energy transition online space was cross validated by using Twitter data.
In order to enlarge the research sample, a snowball analysis procedure was applied [71,72]. The 14 seeds were used as a starting search criteria in the IssueCrawler, a web network location and visualization software, developed by the Social Media Foundation, which allows, given the web address of one or more websites, to track all other connected websites [73,74]. This crawler, after searching and extracting the websites connected with the seeds, tracks the hyperlinks occurring between them and returns all the web pages and hyperlinks retrieved. The snowball analysis consisted in a monthly collection of the data, from February 2019 to December 2019. The Issue Crawler crawled the specified 14 seed, captured the starting point outlinks of these seeds and retained them-one degree of separation. Indeed, a second and a third network collection were conducted at a two degrees of separation from the seeds (i.e., connected websites and the websites linked to them) to further expand the sample. Nevertheless, since in the third network collection (April 2019) no new websites were added to the sample, we decided to collect the subsequent networks by employing one degree of separation from our original 14 seeds.
The resulting dataset consisted of 2044 websites and 76,116 links. While considering the well-known homophily patterns that occur in directly connected network nodes [75][76][77][78], a large majority of our dataset included non-Dutch actors and actors not related to the energy transition. To filter these actors, all Dutch actors were identified. As inclusion rule, all the ".nl" websites were labeled as being Dutch websites. All non-country-specific extension (e.g., ".com", ".org", ".net", etc.) which had a declared geographical position in The Netherlands on the Google search engine were included in the sample. As a result, the dataset was reduced to 145 actors and 156 hyperlinks (edges).
On the other hand, to collect the content related to the energy transition created by these 145 actors, a tool developed by the Social Media Foundation, the Lippmannian device [79] was used. This tool search, given a list of websites and one or multiple keywords, tracks all sentences containing the keywords from these sources. By using this tool, the Bing browser was queried for the keywords "energy transition" and its Dutch translation "energietransitie". Then, each sentence containing these terms was retrieved and aggregated, together with the information on the source website. This procedure ended up with 4703 non-duplicated sentences created by 88 actors.

Phase 2: Data Cleaning and Preprocessing
The actor sub dataset was further cleaned through filtering the relation of the actors with the energy transition process. To do so, the information on the website source provided by the textual sampling phase was used. Only actors with at least one energy transition related content were included in the analysis. At the end of this procedure, a final dataset of 88 Dutch actors and their collection of sentences related to the energy transition was used as a sample in our analyses. To enrich our dataset 3 coders manually classified the type of organization found on the actor's website. Each organization was classified in one of the mutually exclusive categories listed in Table 2. Table 2. Code numbers and categories applied for classifying the corpus. Descriptions adapted from Cambridge Dictionary (2020).

Code
Organization Type Description

NGOs
Organization that tries to achieve social or political aims, but is not controlled by a government 2 Private companies Organization that sells goods or services, not owned by the government

Associations
Group of people or organizations with common interests who work together for a particular purpose, often used in the names of organizations 4 Public institutions Governmental organization that exists to serve a public purpose such as education or support for people who need help 5 Research institutions Organization that follows research purposes, such as a university A first issue in the analysis of the collected textual data concerned its language of origin. For research purposes, all documents were translated into English using the cloud translation API [80] provided by Google. The 4703 sentences created by the actors were aggregated into a single corpus of 207,223 words. The distribution of the documents by type of organization ( Figure 2) shows that private companies and associations are the organizations creating the largest number of documents (1700 and 1192, respectively). On the contrary, the average length of documents is almost homogeneous in each group, ranging between 256 and 299 characters.
Subsequently the corpus was cleaned and preprocessed according to the following procedure: each document was stemmed to reduce lexical variability. All the numeric characters, symbols, HTML tags and words with less than 3 characters were deleted from the corpus. Moreover, a list of 1063 stop words was removed from the corpus.
To focus the analysis on the most relevant words, all terms with frequency of occurrence in the corpus f < 5 were removed from the corpus. This procedure allowed to decrease further the lexical variability, with a little loss in terms of document inclusion since only 9 documents were excluded from the subsequent analysis ( Figure 3).
At the end of the cleaning phase, a total number of 1726 terms in 4693 documents were used for the topic model estimation.

Phase 3: Data Analysis
To explore the topics underlying the communications about energy transition we applied the procedure for the identification of the topics through the structural topic model.

Phase 3: Data Analysis
To explore the topics underlying the communications about energy transition we applied the procedure for the identification of the topics through the structural topic model.

Phase 3: Data Analysis
To explore the topics underlying the communications about energy transition we applied the procedure for the identification of the topics through the structural topic model. The latent Dirichlet allocation [81] is one of the most widely used probabilistic models for computational text analysis in social sciences and economics. Behind the algorithm's generative process lays the assumption that each document can be modeled as a set of topics and that each topic can be modeled as a discrete distribution that defines the probability, for each word, to appear in a topic. Each document, intended as a bag-of-words (a set of terms having no semantic and syntactic complexity in which the position of the word in the text is irrelevant), is synthetically represented by the probability distribution of a K number of latent topics. Note that within an LDA model, the probability of observing a word in a particular position within a document is only a function of the topic and parameters of the model. More specifically, the LDA does not allow us to model changes in the estimation of topics and words within documents based on other relevant information, such as the time or the author of the documents. In order to provide a more accurate estimation of topics related to the energy transition process, this research uses an extension of the LDA, the structural topic model [82]. This model allows the use of metadata as parameters involved in the topic estimation [65], affecting its content (topical content) and proportion in the corpus (topical prevalence). In more detail, a data generative process is defined for each document. After this, the data are used to estimate the most likely values for the parameters in the model. This process generates documents associated with metadata (Xd) starting from the distributions of documents (Dn), topics (Tn) and terms (Wn). As in the LDA, topics are defined as a mix of terms, where each term is associated with a specific probability of belonging to a topic; documents are defined as a mix of topics, implying the possibility of coexistence of multiple topics in a document.
Assuming that the type of organization promoting the communication could influence the issues discussed, we have included the variable "organization type" as a covariate in the model estimation.
Since the STM is sensitive to initialization parameters, it was necessary to select in advance the number of topics to be extracted from the corpus. To reduce the arbitrariness of our choice, a procedure based on the work of Lee and Mimno [83] was adopted. Considering that the identification of the number of topics represents a partly discretionary choice, this procedure uses a data-driven approach to address this task. The procedure involves an interactive estimation of models to compare the fitness of different K parameter (the number of topics) settings. After an exploration of a wide range of model solutions, the interval K [10,20] was selected for a close exploration. The optimal number of topics to select was decided by taking into account the topic quality [82], a criterion based on the choice of the number of topics with the best ratio between two parameters: • Semantic coherence [84], is part of the broader concept of mutual information. This metric, recognized as a reasonable surrogate to human coders judgement, presents higher scores for topics in which the most probable terms included have higher co-occurrence frequency within documents [85]; • Exclusivity refers to the specificity of the words associated with a topic. Topics can be considered exclusive if the most probable terms for a given topic are unlikely to appear within the most probable terms of other topics.
As stated by Schmiedel et al. "While semantic coherence focuses on the internal qualities of single topics, exclusivity takes the similarity between different topics of the same model into account" [86] (p. 18).
We opted for an optimal solution of K = 13 (Figure 4), a good compromise between exclusivity and semantic coherence. This parameter was used for the STM estimation.

Results
A list of 14 seeds was used for a snowball sampling of Dutch actors engaged in the energy transition hyperlink-sphere. After a multistep cleaning and preprocessing procedure-both hyperlink network and text content-the topological characteristics of the networks and the main topics emerging in this relational space were analyzed.

The Hyperlink Network
Our results reveal the existence of a sparse network consisting of 88 nodes (actors) and 79 edges (hyperlinks). Table 3 below shows the descriptive statistics of this network, calculated by using the igraph package for R [87,88]. Both the average path length and the clustering coefficient scores are quite small (2.482 and 0.043, respectively) suggesting that the network does not exhibit an heterogenous distribution of attributes over a wide spectrum [90,91].Thus, unlike many real networks, the nodes of the Dutch energy transition hyperlink network are rather isolated.
The network consists of 39 private companies, 25 associations, 10 NGOs, 8 research institutions and 6 public institutions. Cumulatively private companies and associations are the main creators of content about the energy transition, with 1700 and 1192 documents, respectively. Conversely, public institutions as RVO, Rijksoverheid and PBL and research institutions, as NVO and CE are the single major contributors ( Figure 5). Topologically, the web-based communication structure of the energy transition revolves around few organizations acting as hubs of the network.

Results
A list of 14 seeds was used for a snowball sampling of Dutch actors engaged in the energy transition hyperlink-sphere. After a multistep cleaning and preprocessing procedure-both hyperlink network and text content-the topological characteristics of the networks and the main topics emerging in this relational space were analyzed.

The Hyperlink Network
Our results reveal the existence of a sparse network consisting of 88 nodes (actors) and 79 edges (hyperlinks). Table 3 below shows the descriptive statistics of this network, calculated by using the igraph package for R [87,88]. Both the average path length and the clustering coefficient scores are quite small (2.482 and 0.043, respectively) suggesting that the network does not exhibit an heterogenous distribution of attributes over a wide spectrum [90,91].Thus, unlike many real networks, the nodes of the Dutch energy transition hyperlink network are rather isolated.
The network consists of 39 private companies, 25 associations, 10 NGOs, 8 research institutions and 6 public institutions. Cumulatively private companies and associations are the main creators of content about the energy transition, with 1700 and 1192 documents, respectively. Conversely, public institutions as RVO, Rijksoverheid and PBL and research institutions, as NVO and CE are the single major contributors ( Figure 5). Topologically, the web-based communication structure of the energy transition revolves around few organizations acting as hubs of the network. Associations and private organizations are the most central players in the hyperlink network. Specifically, three associations (NVDE, NWEA and Netherlands Platform Warmtepompen) and two private companies (Itho Daalderop and Stadsverwarming Purmerend) have the highest scores of outdegree centrality (Figure 6), emerging as gatekeepers [92,93], i.e., actors that facilitate connections promoting links in the Dutch energy transition hyperlink network. In terms of indegree centrality, the most authoritative actors [24,41,42,94,95] are associations (NVDE, NWEA, Holland Solar, Dutch Heat Pump Association, BodemenergieNL), with the exception of RVO and Itho Daalderop, a public institution and a private company, respectively. Indeed, among the organization of the network, the indegree distribution is much more heterogenous, while the outdegree distribution shows a few highly connected nodes.
These nodes, namely NWEA, NVDE and to a lesser extent Itho Daalderop and Stadsverwarming Purmerend, are also hubs for incoming hyperlinks, suggesting that being active in linking other nodes pays off in terms of reciprocal connections. Nevertheless, some associations such as RVO, BodemenergieNL, Daalderop, Hollandsolar, despite their low or inexistent linking activity (low outdegree), still receive much attention (high indegree) from the other nodes of the network. Associations and private organizations are the most central players in the hyperlink network. Specifically, three associations (NVDE, NWEA and Netherlands Platform Warmtepompen) and two private companies (Itho Daalderop and Stadsverwarming Purmerend) have the highest scores of outdegree centrality (Figure 6), emerging as gatekeepers [92,93], i.e., actors that facilitate connections promoting links in the Dutch energy transition hyperlink network. In terms of indegree centrality, the most authoritative actors [24,41,42,94,95] are associations (NVDE, NWEA, Holland Solar, Dutch Heat Pump Association, BodemenergieNL), with the exception of RVO and Itho Daalderop, a public institution and a private company, respectively. Indeed, among the organization of the network, the indegree distribution is much more heterogenous, while the outdegree distribution shows a few highly connected nodes.
These nodes, namely NWEA, NVDE and to a lesser extent Itho Daalderop and Stadsverwarming Purmerend, are also hubs for incoming hyperlinks, suggesting that being active in linking other nodes pays off in terms of reciprocal connections. Nevertheless, some associations such as RVO, BodemenergieNL, Daalderop, Hollandsolar, despite their low or inexistent linking activity (low outdegree), still receive much attention (high indegree) from the other nodes of the network.

The structural topic model (STM)
Based on the most recurring terms and looking at the most representative documents, two coders assigned a label summarizing the main argument of each of topic (Table 4).
"The Employment Implications of the Green" and "The Natural Gas Alternative" are the most relevant topic with 11.26% and 11.12% of frequency in the corpus (topical prevalence), respectively ( Figure 7); the other most represented topics are "Policy Making for Climate" and "The Heating Market" with a topical prevalence 10.49% and 9.12%, respectively. On the other side of the spectrum, "The Solar Energy Sector", "Academy and Research Progress" and "Smart Cities and Energy Grid",

The Structural Topic Model (STM)
Based on the most recurring terms and looking at the most representative documents, two coders assigned a label summarizing the main argument of each of topic (Table 4).
"The Employment Implications of the Green" and "The Natural Gas Alternative" are the most relevant topic with 11.26% and 11.12% of frequency in the corpus (topical prevalence), respectively ( Figure 7); the other most represented topics are "Policy Making for Climate" and "The Heating Market" with a topical prevalence 10.49% and 9.12%, respectively. On the other side of the spectrum, "The Solar Energy Sector", "Academy and Research Progress" and "Smart Cities and Energy Grid", are the least addressed topics in the corpus, with a prevalence of 5.68%, 5.60% and 4.05%, respectively. are the least addressed topics in the corpus, with a prevalence of 5.68%, 5.60% and 4.05%, respectively.   To investigate the overlapping relationships between the topics in terms of topic co-existence in the corpus documents, a partial correlation matrix of the prevalence scores was calculated. Using the technique described in Epskamp & Fried [96], we built a sparse graph where each topic represents a node and each link is a significant correlation (ρ < 0.01) within a pair of nodes (Figure 8).
To investigate the overlapping relationships between the topics in terms of topic co-existence in the corpus documents, a partial correlation matrix of the prevalence scores was calculated. Using the technique described in Epskamp & Fried [96], we built a sparse graph where each topic represents a node and each link is a significant correlation (ρ < 0.01) within a pair of nodes (Figure 8). The strongest association emerges between the topics "The Natural Gas Alternative" and "The Heating Market" with a positive correlation of 0.26. Topic 11, "Policy Making for Climate" and "The local economic strategy", have the second strongest association, with a correlation of r = 0.18. Other weaker correlations occur between Topic 11 and, respectively, Topic 5 ("Global level debate", r = 0.13) and Topic 10 ("Employment Implications of Green", r = 0.13). Topic 10, "Employment Implications of the Green", is, in turn, weakly correlated with Topic 6 ("The local economic strategy", r = 0.14) and Topic 7 ("The Heating Market", r = 0.13), while Topic 5 shows a weak positive correlation with Topic 4: "The Natural Gas Alternative". Topologically, " Policy Making for Climate" and "Global level debate" are the most connected topics, while, remarkably, the "Technical challenges in R&D" together with the "Wind Energy Sector" and the "Solar Energy" are loosely connected.
Connections (edges) in topic-analysis highlight the extent of document contents overlap. Most connected topics are thus able to inbreed information through documents (websites) belonging to actors of different backgrounds. To understand the hierarchical structure of such interbreeding and how topics group around shared issues, a hierarchical clustering analysis was carried out. To do so, we used the R package stmCorrViz [82,97]. At the first level of clustering, STM topics tend to be grouped in 5 groups as shown in Table 5. The strongest association emerges between the topics "The Natural Gas Alternative" and "The Heating Market" with a positive correlation of 0.26. Topic 11, "Policy Making for Climate" and "The local economic strategy", have the second strongest association, with a correlation of r = 0.18. Other weaker correlations occur between Topic 11 and, respectively, Topic 5 ("Global level debate", r = 0.13) and Topic 10 ("Employment Implications of Green", r = 0.13). Topic 10, "Employment Implications of the Green", is, in turn, weakly correlated with Topic 6 ("The local economic strategy", r = 0.14) and Topic 7 ("The Heating Market", r = 0.13), while Topic 5 shows a weak positive correlation with Topic 4: "The Natural Gas Alternative". Topologically, " Policy Making for Climate" and "Global level debate" are the most connected topics, while, remarkably, the "Technical challenges in R&D" together with the "Wind Energy Sector" and the "Solar Energy" are loosely connected.
Connections (edges) in topic-analysis highlight the extent of document contents overlap. Most connected topics are thus able to inbreed information through documents (websites) belonging to actors of different backgrounds. To understand the hierarchical structure of such interbreeding and how topics group around shared issues, a hierarchical clustering analysis was carried out. To do so, we used the R package stmCorrViz [82,97]. At the first level of clustering, STM topics tend to be grouped in 5 groups as shown in Table 5.
Topic 1 and Topic 10, referring to "The Wind Energy Sector" and "Employment Implications of Green" group together around the terms "wind", "cooperate", "build", "people", "project", "work", "municipality", representative terms of cluster Ia. The thematic link between Topic 6 and Topic 11, previously highlighted by the correlation graph, is reflected in the hierarchical correlation analysis, where the relationship between climate politics and local economic strategies is made clear in cluster Ib, through the terms "policy", "agreement", "region", "climate", "achieve", "government", "invest". As expected, the intuitive association between "The Natural Gas Alternative" (Topic 4) and "The Heating Market" (Topic 7) is confirmed by the analysis and represented by the cluster IIa through the terms, "transport", "nature", "supply", "source", "heat", "network", "gas". Another intuitive association, the one between Topic 8 and Topic 9, respectively "Academy and Research Progress" and "Technical challenges in R&D", is confirmed and summarized by cluster IIb through the terms "technology", "vision", "develop", "solution", "environment", "innovation", "challenge". Finally, the cluster IIc ("electricity", "city", "smart", "storage", "grid", "environment", "solar") groups together the themes concerning "The Solar Energy Sector" and "Smart Cities and Energy Grid". Table 5. Hierarchical clustering of the latent topics. Topics are grouped in five clusters which are included in two main groups: Cluster Group I comprises Topics 1, 2, 6, 10 and 11. Cluster Group II comprises Topics 3, 4, 5, 7, 8, 9, 12 and 13. Next to each cluster, the most representative terms are reported.
Hoping to gain a better understanding of the relation between the type of organization and the topics discussed in the Dutch energy transition hyperlink network, we calculated the estimated differences in topic proportion between the 5 types of organizations. Figure 9 shows the results of the effect of the type of organization on the topical corpus prevalence. The 8 topics with higher estimated differences per organization-type level are shown.
The agenda of public institutions seems to be set on topics related to "The National Regulation" (Topic 2), "Global level debate" (Topic 5), "The local economic strategy" (Topic 6), "Policy Making for Climate" (Topic11). Indeed, our results show that some issues of paramount social relevance regarding the impact of energy transition on employment and research-represented by Topic 10, "Employment Implications of the Green" and Topic 8, "Academy and Research Progress"-are under-represented in the discussions of the public institutions.
Dutch associations contribute extensively to the communication on the aspects of local economic strategy adjustment (Topic 6). In addition, they seem to extensively discuss issues related to employment and climate policy (Topic 10 and Topic 11). Similarly, NGOs contribute to the communication on Topic 10, but their contributions seem to revolve around natural energy resources, "The Natural Gas Alternative" (Topic 4). As for the communication agenda of the research institutions, they seem to be limited to aspects of scientific progress (Topic 8) and its global impact (Topic 5). Moreover, the core topics discussed by private organizations concerns "The Heating Market" and the energy supply (Topic 4), although the contribution made by private companies to the debate on "Employment Implications of the Green" is also significant (Topic 10). For details on the marginal topic proportion of each of the covariate level on each Topic see the Appendix A.
Hoping to gain a better understanding of the relation between the type of organization and the topics discussed in the Dutch energy transition hyperlink network, we calculated the estimated differences in topic proportion between the 5 types of organizations. Figure 9 shows the results of the effect of the type of organization on the topical corpus prevalence. The 8 topics with higher estimated differences per organization-type level are shown. The agenda of public institutions seems to be set on topics related to "The National Regulation" (Topic 2), "Global level debate" (Topic 5), "The local economic strategy" (Topic 6), "Policy Making for Climate" (Topic11). Indeed, our results show that some issues of paramount social relevance regarding the impact of energy transition on employment and research-represented by Topic 10, "Employment Implications of the Green" and Topic 8, "Academy and Research Progress"-are under-represented in the discussions of the public institutions.
Dutch associations contribute extensively to the communication on the aspects of local economic strategy adjustment (Topic 6). In addition, they seem to extensively discuss issues related to employment and climate policy (Topic 10 and Topic 11). Similarly, NGOs contribute to the communication on Topic 10, but their contributions seem to revolve around natural energy resources, "The Natural Gas Alternative" (Topic 4). As for the communication agenda of the research institutions, they seem to be limited to aspects of scientific progress (Topic 8) and its global impact (Topic 5). Moreover, the core topics discussed by private organizations concerns "The Heating Market" and the energy supply (Topic 4), although the contribution made by private companies to the debate on "Employment Implications of the Green" is also significant (Topic 10). For details on the marginal topic proportion of each of the covariate level on each Topic see the Appendix A.

Discussion
This study shows that the online discourse about the energy transition in The Netherlands is led by a relatively small number of actors. Among these actors, most of the contributions to the energy transition debate come from private companies and associations. Remarkably, none of the hundreds

Discussion
This study shows that the online discourse about the energy transition in The Netherlands is led by a relatively small number of actors. Among these actors, most of the contributions to the energy transition debate come from private companies and associations. Remarkably, none of the hundreds of energy cooperatives in The Netherlands, one of the largest communities in Europe, was collected through the snowball sampling, suggesting that these are either absent from the energy transition online space or disconnected from the actors collected via our methodology.
As for the structure of the hyperlink network, the actors seem to weave a sparse network. However, as highlighted by Park & Thelwall [35], a low network density and the absence of small world properties in hyperlink networks is quite frequent within the World Wide Web [98,99].
Moreover, since the energy transition process is characterized by strong innovation and structural changes, a low cohesion within actor groups (organization types) could reflect an adaptive need of the actors themselves [100]. In fact, as pointed by Gargiulo [101], high cohesion within groups can affect organization adaptation to significant changes in the environment. On the contrary, poor cohesion between actor groups is often seen as an obstacle to change and innovation. When this is the case, Newman suggests that it is desirable for public institutions to act as brokers between the various group of actors in the network.
In the hyperlink network of the Dutch energy transition, the organizations occupying a seemingly brokerage position are mainly associations and private companies (outdegree centrality). In fact, among all the public institutions, only RVO seems to play a brokerage role.
Moreover, together with associations such as Holland Solar, DHPA, NVDE and the Private company, Itho Daalderop, RVO is also the only public institution to be considered authoritative (indegree centrality) in the Dutch energy transition hyperlink network. This strong centrality of the associations in the Dutch network seems to be in line with a context which, as defined by Kemp [21], is characterized by a pronounced corporatism.
As for the topics discussed in the network, our analyses reveal the existence of 13 topics. Not surprisingly, "The Natural Gas Alternative" and the "The Heating Market" are two of the most relevant discussion topics. Despite the renewable energy target goal-of 32% of total energy sources by 2030 [74]-set in the regulatory framework "Clean Energy for All Europeans" (adopted in 2016), The Netherlands remains the second largest European producer of natural gas. This is partially due to the substantial fossil resources the country possesses [102]. Indeed, the natural gas is still the primary source of energy for heating Dutch houses [103].
Particularly relevant in the Dutch energy transition debate are the implications of the energy transition for the job market. The prominence of this issue highlights the real concerns of employers towards international regulations (Topic 11) and the resulting national economic strategies (Topic 6) for a "cleaner" industrial sector (Topic 12). These concerns have often generated frictions, as in October 2018, when the employer association VNO-NCW blocked the introduction of a national CO 2 levy for the industrial sector [104].
The links among these issues, clearly emerged in the topic correlation graph, reflects the search for a better balance between policy making and economic production background.
In fact, the 13 topics tend to collapse into two macro-themes, (a) topics related to governance of local, national and global policies and (b) topics related to technological development aspects of the energy transition process.
It is also possible to observe substantial differences in the communication agendas of the organizations of the Dutch energy transition hyperlink network. Public institutions focus, for instance, on global, national and local policy issues and seem not to pay much attention to the research and employment implications of the energy transition. On the other hand, private companies, associations and NGOs pay much attention to the employment issues; the web-based communication promoted by research institutions is mainly self-referential, focusing on the academic sphere and the scientific progresses.
The failure of research and public institutions to address the issue of employment is a warning signal for the whole community involved in the energy transition process and, in particular, for those actors with a major (political) responsibility. While academics and public institutions succeed in connecting, in terms of topic contents with the corporate world, they seem to fail embracing the communication with the rest of society, represented in our network by associations and NGOs.

Conclusions
The aim of this study was to reveal the main actors of the Dutch energy transition hyperlink network and find the topics discussed in the network.
Our results show that few companies and associations emerge as the most authoritative actors and brokers in the network. Moreover, our topic model reveals that the 'Employment Implications' and the 'Natural Gas' are the two most discussed topics in the Dutch energy transition hyperlink network. In addition, our analyses reveal substantial differences among the communication agendas of the organizations of the Dutch energy transition hyperlink network; while public institutions focus on global, national and local policy issues, private companies, associations and NGOs pay much more attention to employment issues.
The path toward a low-carbon economy and a more sustainable society, for long and difficult that it may be, requires a cohesion of action, unseen in the hierarchical Dutch energy transition hyperlink network. In this network, some actors (e.g., public and research institutions) seem to be confined in communication echo chambers, revolving around discussions about the global aspects of the energy transition and its technological specificities and appear to fail to semantically interact with other actors, such as associations and NGOs, whose discourse revolve around the consequences for the employment of the energietransitie and its impact at a local level. A possible explanation for this to happen is the tendency of public institutions to focus on the regulatory aspects of the energy transition at a national level and at a supranational level, while failing to coordinate the debate on major social issues, such as the employment implications of the energy transition. Similarly, research institutions seem to fail in bridging the academic realm with the broader social landscape.
A higher centrality of Political Institutions in the Dutch energy transition hyperlink network, ideally supported by research institutions, could potentially lead to a wider integration of the local instances-e.g., NGOs-, while allowing a wider and more representative vision of the ongoing transition process.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Effect of the "organization type" covariate on the 13 topics.