mFLICA: An R package for Inferring Leadership of Coordination From Time Series

Leadership is a process that leaders influence followers to achieve collective goals. One of special cases of leadership is the coordinated pattern initiation. In this context, leaders are initiators who initiate coordinated patterns that everyone follows. Given a set of individual-multivariate time series of real numbers, the mFLICA package provides a framework for R users to infer coordination events within time series, initiators and followers of these coordination events, as well as dynamics of group merging and splitting. The mFLICA package also has a visualization function to make results of leadership inference more understandable. The package is available on Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=mFLICA.


Motivation and significance
Leadership is defined as a process that leaders influence a group to achieve collective goals [1,2]. One of leadership definitions is pattern initiation. Leaders are initiators who initiate collective patterns (e.g. movement initiation, trends of stock closing prices) that everyone follows [3]. Collective patterns or coordination events are emerging events of collective actions that aim to reach collective goals [4]. In time series context, coordination events occur when there exists some intervals such that some similar pattern occurs in all time series with possibly different time delay for each time series [3]. A leader of coordination event is a time series that initiates the pattern before others having this similar pattern with arbitrary time delays.
The related concepts of leadership inference on time series are Granger causality [7] (e.g. Imtest package [9]) and Transfer Entropy [10,11] (e.g. RTransferEntropy package [11]). Both techniques can be used to infer whether time series A is a predictor of time series B, which is similar to the following relation concept in leadership inference. Nevertheless, leadership inference aims to identity patterns (e.g. moving to the same trajectory) that distributes among time series and their initiators (leaders) rather than finding predictors. There are many leadership methods in the literature (see Table 2 [3,12]. To fill the gap in the literature, in this paper, I developed an R package for leadership inference in R [13] on The Comprehensive R Archive Network (CRAN) [14]: mFLICA. The methodology of this framework is based on [3,12], which has been peer-reviewed and tested on both noisy simulation datasets and real-world datasets. mFLICA is a framework that is capable of: • Inferring coordination events: the framework can infer and visualize coordination intervals that have high degrees of coordination; and • Inferring dynamics of leaders and followers: the framework can infer leaders of coordination and their followers that can be changed over time. The mFLICA package provides scientists opportunities to analyze and generate scientific hypotheses on coordinated activities that can be tested statistically and in the field. Note that for the details of algorithms in functions and definitions related to this work, see the supplementary document and [12,3] for more details regarding the performance of methodology and its theoretical properties.

Limitation
The mFLICA package has been built based on Dynamic Time Warping (DTW) [15]. Hence, it is an optimization framework that can be used to detect leadership patterns. However, even though the package cannot provide any statistics as outputs, instead, users can derive statistics from the framework outputs. For instance, a confidence interval of a specific individual being leaders can be derived by bootstrapping time series of leaders.
Another assumption is that the framework assumes that no external influences that cannot be found in the datasets. This implies that if the datasets contains partial information without including external influence, leaders found by the framework might not be the true leaders. Hence, users should carefully interpret their results with this assumption.

Software description
I provide details of mFLICA system architecture in Section 2.1, then I describe software functionality in Section 2.2.

Software Architecture
Given a set of time series and related parameters as inputs, mFLICA infers following networks, faction leaders and members, degrees of coordination over time, as well as related visualization. Figure 1 provides an overview of the package architecture. The main function is mFLICA() that calls two functions: getDynamicFollNet() and getFactions(). The getDynamicFollNet() is used to infer a dynamic following network from a set of time series, while getFactions() is used to infer faction leaders and faction members for each time step in a dynamic following network. In getDynamicFollNet(), it calls followingNetwork() for inferring a following network for each time intervals to create a dynamic following network. The followingNetwork() function uses followingRelation() as a main engine to infer a following relation between a pair of time series to build a following network. Lastly, getFactions() calls getReachableNodes() to find faction members, which are nodes that have directed path(s) to the faction leader.

Software Functionalities with Examples
The main tasks of mFLICA are 1) inferring a dynamic following network, and 2) inferring faction leaders and members as well as leadership dynamics.
In this paper, I use a simulated dataset T S that contains 30 time series of movement from [12] to demonstrate in examples of using mFLICA in leadership inference tasks. The dataset consists of two-dimensional time series of 30 individuals moving along the x-axis. The time series length is 800 time steps. There are three coordination events during the time interval [1,200]   To infer a following relation between two time series, I deploy dynamic time warping (DTW) package [16] to analyze an optimal warping path between two time series. Figure 2 shows simulated time series of movement from [12]. In this event, a leader was moving toward x-axis while the follower followed its leader after some time delay. A degree of following relation can be defined below.
Where P L,F is the optimal warping path of leader L, and follower F inferred by DTW and s(P L, Otherwise, there is no following relation for s(P L,F ) ∈ (−σ, σ).
In the next example, I also deploy two time series in T S. In this dataset, T S[1, 1 : 100, ] is a time series of leader while T S[2, 1 : 100, ] is a time series of follower. I use only the first 100 steps of time interval in this example. I run the code below for computing the optimal warping path between leader and follower.
On average, the follower required eight time steps to reach its leader. Next, I calculate s(P L,F ) in Eq. 1.
This implies that there is a high degree of following relation between leader and follower (s(P L,F ) ≈ 0.98). The f ollowingRelation() function in the package is for computing s(P L,F ). I deploy Sakoe-Chiba Banding [15] for speeding up DTW computation. The limitation of band can be set via lagW indow parameter. In this case, I set the band parameter at 10% of the time series length (lagW indow = 0.1). 1 R > mFLICA :: fo llowin gRelat ion ( Y = follower , X = leader , lagWindow =0.1) $ follVal 2 [1] 0.99 I have s(P L,F ) = 0.99 in this example.

Inferring following network
The f ollowingN etwork() function is used to infer an adjacency matrix of a following network. The code below is used to infer adjacency matrices by using a set of simulated time series T S, which contains 30 time series. The low-coordination interval [1,60] and high-coordination interval [61, 120] are chosen in the example. I set σ = 0, 5 for this example.   Figure 3 illustrates the adjacency matrices from both intervals. The weighted adjacency matrix mat1 at Figure 3 (a) is computed from the time interval [1,60] when the group initiated movement. In Figure 3 (b), the weighted adjacency matrix mat2 is computed from the time interval [61, 120] when everyone followed its leader ID1, which implies it is a high-coordination event. mFLICA provides getADJNetDen() for computing a network density from an adjacency matrix. Based on the result, mat1 has a lower network density than mat2's network density. The network densities can be computed below.
In Figure 3 (b), in the row of ID1, all individuals have high degrees of following ID1, which implies that ID1 is a leader in this interval. In contrast, there are no individuals followed by the majority in Figure 3 (a), which implies that this interval has low degrees of coordination.

Inferring dynamic following network
In this part, I use the set of simulated time series TS, which has the time length at 800 time steps. In this dataset, there are three coordination events: [1,200], [201,400], and [401,600]. I set the time window ω = 60, the time shift δ = 6, and the threshold σ = 0.5. The next commands are used to infer a dynamic following network of TS. Suppose I want to know the following degree for ID19 follows ID1 at time step 150, I can use the command below. The time series of network densities can be plotted using the plotMulti-pleTimeSeries function below.

Inferring leadership dynamics
I use the interval [25,45] in the simulation dataset to demonstrate the time when there are more than one factions occur simultaneously. After having a following network, getFactions() takes a binary version of adjacency matrix as its input. The code above shows that there are two faction leaders in the interval [25,45]: ID1 and ID11. This implies that there are two factions. The next step is to query faction members of ID1's faction as well as its faction size ratio. Note that a leader is also a faction member itself. Since there are 30 individuals, almost everyone is a member of ID1's faction. However, the faction size ratio at 0.5 indicates that faction members are not coordinated following the same pattern yet. The next one is the code for querying details about a faction leading by ID11. We can see that there are a fewer number of members in this faction. Note that one individual can belong to more than one faction since the individual might follow some pattern that seems partially similar to several leaders' patterns.
Next, I show how to use mFLICA to infer dynamics of factions. In other words, I would like to find changes of faction members and faction leaders over time. Given a set of time series TS as an input along with related parameters: time window ω = 60, time shift δ = 6, and the threshold σ = 0.5, we run mFLICA() below.
The result of the plot is in Figure 5. According to the ground truth, there are three coordination events. First, during the time interval [1,200]

Impact
In the past, many social science questions including leadership were unable to be answers with quantitative approaches due to the lack of resource and data. Currently, because of innovation and technology developments, data from both online social network and real-world sensors of behaviors of human, animal, or even man-made systems are available. These datasets open opportunities for researchers to ask questions and gain insight about collective behaviors quantitatively. In leadership inference, mFLICA package enables computer scientists, social scientists, and researchers to quantitatively test hypotheses regarding leadership of coordination.
In the commercial realm, mFLICA is able to support companies to measure influence of their products among customers from customers' records. Understanding effects of products is a crucial part for companies to gain or loss profits.
In the research realm, there are currently some examples of new research regarding the potential of utilizing mFLICA in the literature. In online social behavior analysis, there is a recent work in [17] that used mFLICA to analyze time series of records from Twitter's online users for obtaining the structure of arguments in online debates about "no-deal" Brexit. In the animal behaviors, the work in [18] utilized mFLICA to gain the results from GPS trajectories of 26 baboons and found that baboons do not follow any particular dictator but they follow the group, which is consistent with the biological result in [19]. Additionally, in social science, the work in [20] stated that a leadership-inference framework (e.g. mFLICA) has a potential to be used for obtaining causal relationships and influence between people.

Conclusions
In this paper, the details of mFLICA package for inferring leadership of coordination from time series are provides. Leaders are defined as individuals who initiate some patterns and others follow the same patterns with some time delays. A following relation between time series can be detected by analyzing an optimal warping path of Dynamic Time Warping (DTW), which is the main component that mFLICA deploys.
Given a set of time series and related parameters, the mFLICA package can infer a following relation between two time series, following networks, faction leaders, faction members, degrees of coordination, and faction size ratios for each time step.
The network densities inferred by mFLICA tell us regarding the magnitude of coordination: how many time-series individuals follow the same pattern in a given time interval. The faction size ratios provide information regarding faction dynamics; the changes of faction leaders, and/or faction members over time. We provided the examples of how to use mFLICA for solving many tasks in leadership inference. Our framework can be applied to any multivariate time series.

Conflict of Interest
I wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.