Detecting early‐warning signals of influenza outbreak based on dynamic network marker

Abstract The seasonal outbreaks of influenza infection cause globally respiratory illness, or even death in all age groups. Given early‐warning signals preceding the influenza outbreak, timely intervention such as vaccination and isolation management effectively decrease the morbidity. However, it is usually a difficult task to achieve the real‐time prediction of influenza outbreak due to its complexity intertwining both biological systems and social systems. By exploring rich dynamical and high‐dimensional information, our dynamic network marker/biomarker (DNM/DNB) method opens a new way to identify the tipping point prior to the catastrophic transition into an influenza pandemics. In order to detect the early‐warning signals before the influenza outbreak by applying DNM method, the historical information of clinic hospitalization caused by influenza infection between years 2009 and 2016 were extracted and assembled from public records of Tokyo and Hokkaido, Japan. The early‐warning signal, with an average of 4‐week window lead prior to each seasonal outbreak of influenza, was provided by DNM‐based on the hospitalization records, providing an opportunity to apply proactive strategies to prevent or delay the onset of influenza outbreak. Moreover, the study on the dynamical changes of hospitalization in local district networks unveils the influenza transmission dynamics or landscape in network level.


| INTRODUCTION
Despite current approaches to prevention and control, seasonal influenza remains a significant cause of morbidity and mortality worldwide. 1 Being infected by influenza virus, people especially elderly and children are at a high-risk for further deterioration including circulatory diseases, severe respiratory illness, and other life threatening complications. 2,3 Influenza pandemic also causes considerable economic burden including direct medical costs and indirect loss such as substantial workplace absenteeism. The estimated average direct medical costs of influenza in the United States reaches $10.4 billion each year, 4 and the actual annual cost would be more.
Early detection and recognition of upcoming influenza outbreak, and timely public health prevention including vaccination schedule and control strategy, are critical in reducing the pandemic magnitude and distribution. 5,6 However, it is usually a challenging task to achieve the real-time prediction of influenza outbreak due to its complex dynamics involving both biological systems and social systems. In addition, surveillance capacity for such detection can be costly, and many countries lack the public health infrastructure to identify outbreaks at their earliest stages. Furthermore, there may be economic incentives for countries to not fully disclose the nature and extent of an outbreak. 7,8 Therefore, a new computational method is required to predict the outbreak of epidemic diseases only based on available data, thus simplifying information gathering and monitoring processes.
The dynamic network marker/biomarker (DNM/DNB) is our recently proposed method. It is a generalized methodology to identify the tipping point or pre-transition state which is a critical state before the catastrophic event, 9,10 by mining the dynamical information from both horizontal high-dimensional data and longitudinal historical records. Regarding the influenza outbreak as a tipping point at which the system undergoes a critical transition, then there is a common understanding that the dynamical process of the system can generally be expressed by three states ( Figure 1B), that is, a normal state with high resilience, a pre-outbreak state (the critical state) with low resilience, and an after-outbreak state with possible high resilience. The normal state is a steady stage, during which there are no many clinic visiting patients. The pre-outbreak state is defined as the limit of the normal state immediately before the tipping point. In this pre-outbreak stage, the process is usually reversible to the normal state if appropriately treated, implying the criticality of the preoutbreak state. Unlike the traditional detection of the after-outbreak state, the DNM enables the identification of the pre-outbreak state or critical state that generally has no clear abnormalities but with future trending of deterioration or critical transition. This method has recently been successfully applied to a variety of biological progresses to detect the early-warning signals to an irreversible catastrophic stage, such as the cell differentiation process, 11 the process of cell fate decision, 12 the critical transition in the immune checkpoint blockade-responsive tumour, 13 the multi-stage deteriorations of T2D, 14 acute lung injury, 15 HCV induced liver cancer, 16 cancer metastasis, 17 and many others. [18][19][20][21] In this study, DNM method was employed to explore the dynamical information based on a combination of city network and the high-dimensional clinic hospitalization records, which are from over 278 clinics distributed in 23 wards in Tokyo, Japan, and 225 clinics distributed in 30 districts in Hokkaido, Japan. The results show that the DNM method successfully identified the critical state just before the outbreak of influenza as a realtime surveillance system. Such a system may enable a rapid response for the preventive care or the implementation of interventions to a health epidemic. In addition, this work unveils the influenza transmission dynamics or landscape in a local district network level, based on the measured data. The advantage and effectiveness of the DNMbased system is also demonstrated by the comparison between DNM and other surveillance systems of flu pandemic.

| Dynamical network marker or dynamic network biomarker
Influenza viruses circulate around the world every year, causing financial losses, suffering, and death. The dynamical process of flu outbreak can be modeled by three states or stages ( Figure 1) similar to disease progression 9 : the before-outbreak state, which is a stable state with high resilience or high robustness to perturbations; the pre-outbreak/critical state, which is the tipping point just before the catastrophic shift into the outbreak state and is thus characterized by low resilience or low robustness due to its critical dynamics, but is still reversible to the before-outbreak state with appropriate control management; and the outbreak state, which is another stable state with high resilience or high robustness. Clearly, it is of great importance to identify the pre-outbreak state, which holds the key to apply effective control management to prevent the catastrophic flu outbreak.
However, different from the outbreak state in which there are obvious signs including huge amount of outpatient visits, it is a difficult task to identify the pre-outbreak state because there are generally no significant signs or differences between the before-outbreak state and the pre-outbreak state. On the other hand, the dynamic network marker/biomarker (DNM/DNB) method was developed to quantitatively identify the tipping point or the critical state during the dynamic evolution of a complex system based on the observed data. Theoretically, when a complex system is near the critical point, there exists a dominant group (a dominant group of variables or members) defined as the DNM features, which satisfy the following three necessary conditions based on the observed data 9 : • The correlation (PCC in ) between any pair of members in the DNM group rapidly increases; • The correlation (PCC out ) between one member of the DNM group and any other non-DNM member rapidly decreases; • The standard deviation (SD in ) or coefficient of variation for any member in the DNM group drastically increases.
In other words, the above conditions can be approximately stated as: the appearance of a strongly fluctuating and highly correlated group of features implies the imminent transition into the flu outbreak. Then, these three conditions are adopted to quantify the tipping point as the early-warning signals of diseases, and further, the identified dominant group of features consists of DNM members.
The 2-fold change threshold is usually applied to recognize the significant changes in DNM score and obtain the warning signal. The DNM theory has been applied to a number of analyses of disease progression and biological processes to predict the critical states as well as their driven factors. [9][10][11][12][13][14][15][16][17][18][19][20]22 In this work, by considering the flu outbreak process as a non-linear dynamics process, we further applied the DNM method to detect the tipping point or the earlywarning signal of flu outbreak. To quantify the critical state, the following criterion I DNM was used as the signal of the critical point by combining the above three statistical conditions: Thus, from the observed data of a sample, whenever there is a group of features appearing with a high I DNM score, this group of features is the DNM group and the state of this sample is considered to be near the tipping point. Therefore, from the hospitalization records of each sample, we can identify the DNM members and further quantify whether or not this sample is near the critical state using the I DNM score.
To further reliably identify the critical state of flu outbreak, we developed a new method called the landscape DNM, which explores F I G U R E 1 Schematic illustration to detect early-warning signals of influenza outbreak based on the DNM method. A, The historical information of clinic hospitalization caused by influenza infection between 1 January 2009 and 31 December 2016 was extracted and assembled from public records of Tokyo and Hokkaido, Japan. B, According to the DNM theory, the process of a time-dependent non-linear system is divided into three states, including a normal state, a pre-outbreak state and an after-outbreak state. The abrupt increase in the DNM score indicates the pre-outbreak state, ie, the tipping point just before the upcoming catastrophic influenza outbreak that results in a boost of clinic-visiting patients. C, Based on the historical and current clinic records, and regional geographic characteristics of a city, the DNM score is able to provide the early-warning signals of the upcoming influenza outbreak as a real-time indicator monitoring both the local and global records as well as the network structure, and the detailed algorithm is provided below.

| Landscape DNM score
Given a network structure for the observed variables, an efficient method to detect DNM, called the landscape dynamic network marker (or landscape dynamic network biomarker), is proposed by employing the local-landscape method on the basis of the three DNM statistic properties. 9 Specifically, first we mapped the historic records of flu patients to the city network ( Figure 2A). Second, the network was partitioned into many local networks. Each local network contained a centre node/ward and all of its first-order neighbours based on the network structure. The local-network index Iscore of a centre node at time point t for a local network with n members (ie, one centre node with n-1 first-order neighbouring nodes) was then calculated through the following definition: where jΔSD t ðinÞj ¼ ∑ n i¼1 jSD t ðiÞ À SD tÀ1 ðiÞj n is the average differential standard deviation (in absolute value) of the nodes inside the local network; ;j¼1 jPCC t ði; jÞ À PCC tÀ1 ði; jÞj n Â n is the average differential Pearson's correlation coefficient (in absolute value) inside the local network, i.e., both nodes i and j are in the same local network; jΔPCC t ðoutÞj ¼ ∑ n i¼1;j¼1 jPCC t ði; jÞ À PCC tÀ1 ði; jÞj n Â n is the average differential Pearson's correlation coefficient (in absolute value) between a member (node i) in the local network and that (node j) outside.
Theoretically, when the system approaches the tipping point, ie, t ∈critical state, and t-1 ∉ critical state, there are three cases for the local network of a centre node: • In the local network, all the nodes (or nodes) are DNM members; • In the local network, there are DNM and non-DNM members; • In the local network, all the nodes are non-DNM members.
According to the three cases respectively, there are critical behaviours for a centre node shown as in Table 1.
Thus, the network based index, I t , can quantitatively characterize the criticality of the state for each DNM member or node. Clearly, each node has an I t value, and hence those I t scores for all of nodes with the time evolution construct a landscape as shown in Figure 3.
When the system approaches the critical state, I t of each DNM node increases drastically based on the three statistic conditions of DNM, while I t of other non-DNM node may have no significant change.
Obviously, during the critical transition, the DNM group has an abil-

| Data normalization
For each ward or district, the raw data were averaged in terms of the total number of clinics within the ward/district. This normalization process is directly related to the population of each ward/district, since the population is roughly proportional to the number of clinics.

| Sliding window
The raw data were processed through window shift where window breadth is set as 5, that is, both the standard deviation and correlation coefficient are calculated based on the data within every 5 weeks.

| Detecting the seasonal flu outbreak in Tokyo
The flu transmission dynamics before sudden outbreak is usually too complicated to be fully expressed mathematically in high-dimensional spaces involving both biological systems and social systems. The drastic or a qualitative transition in a local system or network, from a normal state to an after-outbreak state, corresponds to a so-called bifurcation point in dynamical systems theory. 21,23 If the system is approaching a bifurcation point, it will eventually be constrained to a one-or two-dimensional space (ie, the centre manifold in a generic sense), in which a dynamical system can be expressed in a very simple form. This is the theoretical basis for developing a general indicator that can detect the critical state of flu outbreak only based on the observed data.
As shown in Figure 1 N→ 0 Notation: the system is near a tipping point, ie, it moves from time point t-1 to t, with t ∈ critical state, and t-1 ∉ critical state. 1. "↗" represents the increase of the index; "↘" represents the decrease of the index; "→" represents that there is no significant change in the index. 2. "D" stands for the DNM members, or the PCC with DNM members; "N" stands for the non-DNM nodes, or the PCC with non-DNM members. 3. SD t is the average standard deviation at time t; PCC t (in) is the average Pearson's correlation coefficient between two nodes inside the local network; PCC t (out) is the average Pearson's correlation coefficient between a node inside the local network and a node outside.  (Table S1). The dynamical evolution of network shows that the DNM-based system uncovers the epidemic situation and transmission trends, which better present the transmission dynamics at a system network level.

| Application of DNM in Hokkaido region
As another DNM application to the influenza outbreak, we also applied the DNM to detect the early-warning signals against flu outbreak in Hokkaido region, which is shown in Figures S2-S5. It is seen from Figure S5 Figures S2 and S3). Figure S4 shows the dynamics of the region network of year 2014 in terms of local DNM scores.

| Performance comparison with other methods
The performances of DNM score is compared to other systems using machine learning algorithm ( Figure 6). Specifically, a popular surveillance system of flu pandemic is based on logistic regression. [24][25][26] It is clear from Figure 6 that given only hospitalization records, the DNM-based system performs better than a system based on logistic regression.
Actually, the DNM method has natural advantage comparing with traditional machine learning algorithm in the following aspects. F I G U R E 6 The performance of DNMbased and machine-learning-based methods. It is seen that using only the hospitalization records, the DNM-based surveillance system performs better than the logistic regression. The AUC of DNM is 0.898 while that of logistic regression is 0.839. The performance comparison is carried out based on the data of Tokyo. Note that the DNM-based method generally has no overfitting problem due to the three statistic conditions of DNM without the training data, in contrast to the machine-learning-based methods that depend on the training data