An introduction to network analysis for studies of medication use

Background: Network Analysis (NA) is a method that has been used in various disciplines such as Social sciences and Ecology for decades. So far, NA has not been used extensively in studies of medication use. Only a handful of papers have used NA in Drug Prescription Networks (DPN). We provide an introduction to NA terminology alongside a guide to creating and extracting results from the medication networks. Objective: To introduce the readers to NA as a tool to study medication use by demonstrating how to apply different NA measures on 3 generated medication networks. Methods: We used the Norwegian Prescription Database (NorPD) to create a network that describes the co-medication in elderly persons in Norway on January 1, 2013. We used the Norwegian Electronic Prescription Support System (FEST) to create another network of severe drug-drug interactions (DDIs). Lastly, we created a network combining the two networks to show the actual use of drugs with severe DDIs. We used these networks to elucidate how to apply and interpret different network measures in medication networks. Results: Interactive network graphs are made available online, Stata and R syntaxes are provided. Various useful network measures for medication networks were applied such as network topological features, modularity analysis and centrality measures. Edge lists data used to generate the networks are openly available for readers in an open data repository to explore and use. Conclusion: We believe that NA can be a useful tool in medication use studies. We have provided information and hopefully inspiration for other researchers to use NA in their own projects. While network analyses are useful for exploring and discovering structures in medication use studies, it also has limitations. It can be challenging to interpret and it is not suitable for hypothesis testing.


Introduction
Studies in social pharmacy and pharmacoepidemiology often utilize highly complex data and require the use of sophisticated methods to discern important patterns. Data used for quantitative studies in social pharmacy and pharmacoepidemiology can be described as attribute data and relational data. Attribute data includes the characteristics of the studied objects (e.g. sex, age, medication use, sociodemographic information, etc.) while relational data contains the various relationships between subjects. The suitable way of studying attributes data is quantitative analyses, whereas, for relational data, Network Analysis (NA) is the appropriate approach 1 . The subjects studied in network analyses can take many different forms.
A network can be described as a graph that shows the interconnections between a set of actors.
Each actor is represented by a node and each connection between these nodes is represented by an edge 2 . NA is a mathematical approach to study the relationships among nodes 3 . The mathematical background of NA are summarized elsewhere 4,5 .
Network Analysis has its roots in many research disciplines 6 . Network analysis is used, among others, in social studies 7 , ecological studies 8 , genetics 9 and systems pharmacology 10 .
As seen in figure 1, a network can be undirected (a and b) or directed (c and d). In a directed network, arrows show the direction of the relationship between nodes. In an undirected network, the relationship does not have a specific direction. The network edges can be weighted (b and d) or unweighted (a and c). In an unweighted network, the two nodes either have a relationship or not, while a weighed network considers the strength of the relationship.

Use of Network Analysis in Public Health
Transmission networks have been used to examine the risk of disease transmission by investigating the relations between the infected people and healthy ones [11][12][13] . Another form of transmission is the transmission of information. NA has been used to visualize the dissemination of public health information to different organizations and consumers. Some network characteristics reveal the pattern and the main actors contributing the most to information spread. Simulated networks can be used to suggest how to accelerate information spread 14 . An example of this type of networks, the diffusion of information among physicians regarding a new drug. The study showed that more socially integrated physicians introduced the drug months before corresponding isolated physicians 15 . NA was also used to study how health workers' professional and personal behavior impact health services 16,17 .

Drug Prescription Network (DPN)
Pharmacoepidemiological studies of medications that are prescribed or dispensed is a relatively new application of NA. To our knowledge, Cavallo et al. were the first to study a drug prescription network in 2013. They used medications as the nodes and the number of patients being prescribed these medications as edges. They aimed mainly at describing the topology of the coprescription network to demonstrate which drug classes are most co-prescribed. They also compared the male/female networks and networks from different age strata 18 .
Bazzoni et al. were the first to use the term Drug Prescription Networks (DPN) in their paper published in 2015. They concluded that the DPNs are dense, highly clustered, modular and assortative. Density reflects frequent co-prescribing. Modularity suggested that the network could be subdivided into clusters. The study also showed that it is possible to highlight spatial and temporal changes by comparing different networks 19 .

Topology analysis
Network topological features refer to a group of characteristics, which either describe the network as a whole (network-level) or define individual actors of the network (nodelevel). There are many topological measures and each of them gives information about a specific network attribute, which then may warrant further investigation.
a. Global network description (network-level): A group of measures that describe the network as a whole.
-Number of nodes: the total number of drugs in the network. The network nodes can be grouped to show the number of drugs in each drug class. Different networks of different populations will have different distributions of drugs in the drug classes.
-Density: the density of a network is the number of actual edges divided by the total number of edges that would exist if all the nodes in the network were connected. This potential number can be calculated by the formula below where is the number of nodes: The network density can be useful in terms of comparison between different networks that describe the same type of drug-drug relation.
Assortativity: a network is assortative when the nodes that share a similar trait tend to connect. This trait can be many characteristics such as the nodes' degree. In this case, the assortativity means that nodes with a high number of edges tend to connect. Assortativity can be examined in terms of other common characteristics between the nodes as well.
Assortativity coefficient is measured using Pearson correlation. Assortativity coefficient is scaled between -1 and 1, where 1 is most assortative 21 .

b. Node-level measures
Node-level measures describe the features of the different nodes across the network.

Centrality measures:
Centrality measures indicate the importance of the network nodes by assigning a score to each of them. There are many different centrality measures and each of them can be used to describe a specific type of importance. By comparing the different centrality measures of a node, we can understand the different ways a node is influential to the network. This paper will discuss 4 common types of centrality: degree, betweenness, closeness, and eigenvector centrality. The mathematical explanations of these measures are mentioned here 22,23 .

Degree centrality
Degree centrality is the number of edges that are connected to a node. A higher score indicates that the node is connected to many other nodes. Node A in figure 3 has a degree score of 4. In a directed network, the degree is split into In-degree, which is the number of edges that direct to a node and Out-degree, which is the number of edges that originate from the node. In-and out-degrees will therefore show the directions of relationships in a network. In figure 3, nodes C and D have an in-degree score of 3, while nodes A and G have an out-degree score of 3.

Betweenness centrality
The betweenness centrality of a node indicates how many times this node was used to connect two other nodes by the shortest possible path. Increasing the number of shortest paths will increase the betweenness centrality score 22 . In figure 3, node A has the highest betweenness centrality score of 1.5.

Closeness centrality
It is a measure of the average distance between the node and all other nodes in the network. Nodes with the highest closeness score have the shortest distances to all other network nodes. The nodes A, B and F have the highest closeness centrality score of 1.

Eigenvector centrality
It is a measure of the importance of a node in a network based on the node's connections with other vital nodes. Relative scores are given to all nodes in the network based on the concept that connections to high-scored nodes give a higher score to the node than equal connections to low-scored nodes. In other words, a high eigenvector score means that a node is connected to many nodes, which themselves are connected to important nodes in the network and have high scores of eigenvector centrality. This means that a node with a high eigenvector centrality score is not necessarily connected to the highest number of nodes in the network but is connected to the nodes with a high number of edges 24 . Node C in figure 3 has the highest eigenvector centrality score of 1. Assigning the centrality of each node in the network may lead us to visualize the network from a single specific important node perspective; this is called an Ego-network and it visualizes the part of the network that has the node of interest and the nodes that are directly connected to it.

c. Edge-level measures
Edge-thickness: in a weighted network, the edge-thickness represents a quantitative measure of the strength of the connection between two nodes. This representation is unique for NA and can be used to study many research questions. We will show an example where the number of users that co-medicated a pair of medications are used to represent the edge-thickness. In this context, thicker edges represent more frequently used pairs of medications.

Modularity analysis (Community detection)
One key feature of the network structure is its modularity. A module is a group of nodes that have many connections between each other and few(er) connections to the other nodes in the network 25 . There are many techniques of community detection including density-based, centrality-based, partition-based and hierarchical clustering techniques 20,26,27

Network comparison
It is possible to compare two or more networks to show the changes over time (temporal), between different areas (spatial), or between different groups of patients. These comparisons can be done by comparing the characteristics of the networks to highlight the differences in numbers and influences of the nodes. Another way to compare different networks is to subtract or divide the values of the edges between two networks. This will create edges representing the differences between the networks, see supplementary 4. By comparing many networks, dynamic graphs can be created showing the topological changes from a network to the next. Nodes will appear, disappear or change their locations as the dynamic graph moves through the different networks

Bipartite networks
A network can be uni-or multipartite. We will only discuss uni-and bipartite networks. Unipartite networks have one set of nodes, while in bipartite networks the nodes belong to two disjointed sets (such as prescribers and patients). In a bipartite network, edges connect the nodes from different sets 29  The aim of this paper is to introduce the readers to NA as a tool to study medication use by demonstrating a practical real-life example of medication use in the elderly in Norway whenever it is possible, otherwise by giving an example from other studies

Methods
We created a network of co-medication in elderly persons in Norway. We also created a network describing the severe drug-drug interactions (DDIs). Finally, we generated a network with the actual use of drugs with severe DDIs by combining the previous two networks.

Co-medication network
The dataset used comes from the Norwegian Prescription Database (NorPD). It covers all dispensed prescriptions to elderly persons (≥ 65 years) in Norway between 2012 and 2014. The NorPD collects data from all pharmacies in Norway and covers all outpatient dispensing for the entire Norwegian population. Details on the NorPD are published elsewhere 31 . In total, the dataset included 765,383 patients, 344,285 men (45%) and 421,098 women (55%) with 75 years as mean patient age. Edges in this network represent the number of patients who combined pairs of medications. In order to define the co-medication, we created treatment episodes using the Proportion of Days Covered (PDC) approach 32 . We assumed that patients used one Defined Daily Dose (DDD) 33 per day and added 20% to each prescription duration to account for imperfect adherence. We also allowed a medication-free gap of 14 days before ending a treatment episode and starting another. This means that if the patient exceeds 14 days without the medication, the treatment episode for this medication ends and a new episode starts if the patient picks up a new prescription. Finally, co-medication was defined as the overlapping drug treatment episodes at the index date, January 1, 2013.
For each pair of nodes (drugs), we summed up the number of co-medication occurrences (i.e. number of patients combining these two drugs) to create a weighted and undirected network.
We excluded the medications that have no defined DDD such as the medications for topical use, vaccines, and ophthalmologicals. In total, we excluded 357 medications (217 local and 140 systematic drugs). The co-medication network is shown here: https://mohsenaskar.github.io/comedication/network/. The network is searchable by substance name. Clicking on any node shows the ego-network of this node as well as some network measures.

Severe drug-drug interactions network
To create this network, we used a dataset derived from the Norwegian Electronic Prescription Support System (FEST). FEST is a national information service that provides common pharmaceutical data to the IT-systems that are involved in the drug prescribing process including systems used by physicians, hospitals and pharmacies 34 . Drug-drug interactions is a part of the FEST database. In FEST, the DDIs are divided into 3 categories; interactions that should be avoided The data from the NorPD contains attributable data including a patient identity number, sex, year of birth, and data about each individual dispensed drug. To create a network, this data needed to be reshaped. The first step was to create a file with only medications that were used on the index date. Secondly, the file was aggregated such that an edge list was created. The edge list contains 2 variables defining the pairs of drugs and one variable with the number of users co-medicating with each pair of drugs. This edge list can be used by various software as described below. The process of data preparation is summed up in figure 5 and

Software to use for network analysis
There are many available tools to use for NA. We will focus on how to use the nwcommands package in Stata and the igraph package in R as well as visualization in Gephi. Other packages like "igraph" or "NetworkX" for Python are popular as well. All these packages can be used for visualizing and computing different network measures with differences in their integrated features and performance 35,36 .

Using Stata (nwcommands, nwANND)
Using the edge list, nwcommands will create an adjacency matrix 37 . The adjacency matrix is a square matrix that contains the relationships between every pair of nodes in the network. The adjacency matrix can be saved as Pajek format that can be later imported and used by Gephi. In addition, nwcommands can display some network measures on both the network and node-level.
NwANND is used for calculating the assortativity coefficient 38 . The syntax can be found in supplementary 2.

Using R (igraph)
Igraph (https://igraph.org/) is a library for creating and analyze graphs. It is widely used by network researchers to analyze graphs and networks. It is currently available for C, C++, Python, R and Mathematica.
One of the strengths of igraph is that it can be programmed with a high-level programming language and still be very efficient when handling large networks. In our R context, igraph integrates well with the visualization package (ggplot2) via the ggraph library.
Igraph uses an edge list and can link it with attribute data for each node as well. An example of network visualization using Igraph and ggraph, is given in Supplementary 2.  Figure 6 shows that the majority of anatomical drug classes were assortative. This means that the drugs from the same anatomical group tend to be more co-prescribed. We also investigated the assortativity of the drugs on the pharmacological level (3 rd level Anatomical Therapeutic Chemical classification) in supplementary 3. Ego-networks as a measure can be seen by accessing the online networks we created and selecting individual nodes. The different network links can be found in the method section.
The top 10 edge weights for the severe DDIs in the co-medication network and co-medication only network are shown in tables 2 and 3 respectively. We see in table 2 that the number of patients using drugs causing severe DDIs are relatively low (less than 1000 users for all) while the most commonly co-medicated drugs seen in table 3 is much higher with acetylsalicylic Acid (aspirin) and simvastatin having around 83000 users representing almost 11% of the population.

Modularity analysis
We found 4 modules in the co-medication network and 11 modules in the severe DDI network.
For the co-medication network, there was one large community and 3 other smaller communities..

Discussion
A Network visualizes the relationships of a dataset in one graph. This unique ability of data representation is combined with many measures that are helpful for many research disciplines. A starting point for generating any network is to select the nodes and define the edges. A precise definition of the edges allows the researcher to extract the correct information. NA is a well-suited approach to study complex systems. Although the approach has been widely used in many fields of research, only a few studies studied the drug-relations in a network 18,19 .
Our results show that many network outcomes can be useful in the studies of medication use.
Moreover, some results are unique measures that only NA can perform such as edge measure and modules detection. Employing centrality measures in the drug study introduce an opportunity to observe the influence of the different drugs in the drug-network. Determining this influence can be useful for clinicians and decision-makers.
After generating a network, some topological features have to be reported first to get a general idea about the network content and its basic characteristics. Network-level measures such as assortativity and density reveal many clues for further investigation. Centrality measures show how influential each node is in the network. It is possible to have high centrality of one type and a low of another for the same node 8   Our study has some limitations. As we used the DDD to outline the treatment episodes, we excluded the medications that have no defined DDD. This reduced the represented co-medication in our networks to the actual co-medication at the index date.
NA also has some important limitations. As a tool, it can be used to explore data, to find unusual structures, group nodes together and find unusual individual nodes. However, it can be hard to interpret results from NA and it is only suited for hypothesis generation. It also cannot explore many sets of relationships between variables at the same time as well as determining causal relationships. For such research questions, other hypothesis testing methodologies will be more needed. However, in research focused on exploration, NA can be a valuable tool.

Conclusion
The main purpose of this paper was to demystify the NA as a method. We have explained the terminology of network analyses and showed, with examples, how network analyses can be used for hypothesis generation. The online links to our networks visualize the data much better than a static picture can and we hope that we have provided enough information, and inspiration, to explore how you can use NA on your own data. We are confident that the future will see many new applications of NA and interesting results for researchers in social pharmacy and pharmacoepidemiology.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of competing interest
The authors declare that they have no conflicts of interest related to this study.