Exploring Bipartite Network Approach in Hand, Foot and Mouth Disease Hotspot Identification

Mathematical modeling of hand, foot, and mouth disease (HFMD) mainly focuses on compartmental modeling approaches. It classifies human population into compartments and assumes homogeneity that regards every human has equal chance of contacting other individuals in the population. However, the transmission of HFMD is complicated and dynamic with the interactions of the intertwined biomedical and social factors. Describing the disease transmission dynamic that involves high-dimensional space is mathematically challenging. The graph theoretic bipartite network modeling (BNM) approach has the potential to handle this challenge by abstracting the real-world disease transmission system and incorporating the individual features of the bipartite nodes. This study aims to seize the advantages portrayed by the BNM approach in capturing the heterogeneous features of the entities within a disease transmission system. It intends to explore adopting the BNM approach in modeling the transmission of HFMD at Kuching, Malaysia and identify the hotspot by employing the BNM approach comprising a four-stage methodology adapted from the BNM methodology framework. The bipartite HFMD contact (BHC) network is formulated with the basic building block consisting of the location and human nodes. The individual parameters of the location and human node are incorporated. The resulting BHC network formulated comprises 10 human nodes, 20 location nodes, and 23 edges. Then, six top-ranked location nodes were identified and agreed with the chosen benchmark system. The potential HFMD hotspots are thus identified by determining the location nodes ranking. The result from this study has enabled timely and effective measures and policies to be customized accordingly by the public health authorities and related policymakers


Introduction
Hand, foot, and mouth disease (HFMD) is a seasonal and highly contagious disease that has caused health care alert throughout the world. It has been prevalent among the Asia-Pacific countries since the 1990s and sees large-scale recurrence every few years 1 . HFMD is a common febrile illness associated with Non-Polio Enterovirus infections (NPEV) 2 caused by multiple Enteroviruses (EV) serotypes. The main etiologic agents reported to cause HFMD are human enterovirus A71 (EV71), coxsackievirus A6 (CA6), and coxsackievirus A16 (CA16) 3 . The typical symptoms include fever, painful sores, and vesicles in the mouth and skin rashes, maculopapular or vesicular on hands, feet, buttocks, and sometimes on elbows and knees 4 . However, more severe illnesses like meningitis and encephalitis are also reported, and critical complications associated with neurological, cardiovascular, and respiratory problems are noticed occasionally 5,6 and they could be fatal 1 . Although the vaccine for EV-A71 has seen encouraging progress in some countries in Asia 7 , there are still no warranted antiviral treatments or vaccines available to combat HFMD 8 .
Infants and young children under five years old are significantly more susceptible to the disease than the rest of the population 9 . In a study by Chua and Kasri 10 , it is reported that as many as 82.6% of HFMD cases are children aged below 6, and the majority are among those aged 1 (18.8%) and 2 (17.9%). After 3 to 5 days exposed to the virus, HFMD patients start showing the related symptoms 1 . Most patients showing the self-limiting illness mentioned above typically recover in 7 to 10 days without much medical treatment 11 . It is reported that children who succumbed to the illness are those within six days of onset of the illness, whereas death usually ensued within 24 hours after the onset of cardiac instability 10 .
HFMD incidences in Malaysia showed its first outbreak in Sarawak and then throughout the country in 1997 mainly due to the EV71 pathogen and reported 41 deaths 12 . Subsequently, a similar outbreak recurs around every two to three years 13 . In 2018, a large HFMD outbreak in Malaysia recorded more than 40,000 HFMD cases, where fatalities were reported 14 . The standard public health measures to handle this situation include issuing quarantine orders to all infected persons and reducing transmissibility by restraining close contact and mobility of the infected persons and children from high-risk age groups. This involves the closure of nurseries, playschool, kindergartens, playgrounds, and swimming pools, and discouragement from bringing young children to crowded public places. As a result, it has caused substantial social pressure on the communities specifically and the nation largely. Although the fatality rate is low, the social consequences and the extensive cost associated with the economy and public health sectors caused by large-scale outbreaks of HFMD are immense 4 . It has caused a high disease burden for children across the world 13 .
The primary factors identified are closely related to meteorology, and the social interaction of the community 15 . People get infected with HFMD by contacting the infected patient, the viruscontaminated surface, water and food, and the patient's respiratory droplets 6 . With suitable conditions outside the host, the pathogen EV71 is able to survive for as long as three days and can be found in the fecal samples of infected patients for more than seven weeks 16 . Besides, the coxsackie virus is reported to survive for at least two weeks outside the host under favorable conditions 15 . The virus is fond of an environment with high humidity and temperature, with dry and non-porous surfaces 15 . The studies show that EV71 can tolerate disinfectants containing even 75% of alcohol and 95% ethanol is unable to entirely deactivate it although it is the most recommended minimum concentration 16 . Humans are reported as the only known host of EV71; thus, humanto-human transmission is generally assumed 1 .
Mathematical modeling of the HFMD characteristics, transmission, prediction and controlling the outbreak has been progressing positively to assist in combating the disease and the consequences it brings. The methodology used is predominantly the conventional compartmental modeling approach. It commonly classifies the human population into three basic major compartmentssusceptible, infected, and recovered (SIR). More compartments like the exposed (E), as in the SEIR model are added to refine the model further. Some studies on HFMD modeling in Sarawak and Malaysia are discussed here. A simple deterministic SIR model for HFMD by Chuo and Labadin 17 confirmed the rapid transmissibility of the disease, where the susceptible persons should be monitored closely. A dynamic SEIPR model for HFMD by Chan et al. 18 incorporates the virus incubation period (E) and post-infection virus shedding period (P) as the additional compartments besides the basic SIR compartments. It outperforms the SIR model and suggests that the liability of the HFMD outbreaks is possible based on the minimum proportion of a population. With statistical approaches as the basis, Mohammad Sham et al. 19 and Mohammad Sham et al. 20 fit the trend surface model and time series (auto regressive moving average (ARMA)) model to the HFMD data, respectively, for modeling and forecasting. In a study by Mohammad Sham and Krishnarajah 21 , Geographical Information System (GIS) mapping is used to describe and observe the spatial pattern of the transmission of HFMD.
Nonetheless, researchers stipulated that the transmission of HFMD is complicated and dynamic with the interactions of the intertwined biomedical and social factors 4 . Thus, they conceded that it is mathematically challenging to describe the disease transmission dynamic involving high dimensional space. Besides that, the compartmental modeling approach ignores heterogeneity and random effects, which are vital considerations in developing an epidemic 22 . Consequently, the bipartite network modeling (BNM) approach particularly stands out in this aspect as it is able to abstract the real-world system using a network system by capturing the dynamic interaction between the two different types of nodes in the network. This is achieved by integrating individual features that define each bipartite node, capturing the heterogeneity of the abstracted real-world system 23 . Network models are flexible models capable of illustrating the interaction between different components in a complex system. The models are also compatible to demonstrate the transmission of diseases in many forms 24 . However, studies employing BNM in surveying HFMD by incorporating the individual features defining each bipartite node are scarce. This paper aims to fill this gap and explore the applicability of the BNM approach in identifying the HFMD hotspot. The contribution of this study lies in the results that imply the potential of utilizing the BNM approach in advancing the study on the characteristics, transmission, prediction, and control of HFMD outbreaks. The organization of this paper is as follows: Section 2 discussed the methods employed and the material used in this study. Section 3 provided the results and their corresponding discussion. Lastly, the conclusion of this study is presented in Section 4 with the recommendation for potential future works.

Methods and Material
In this section, the discussion for the methodology and material used in this study is presented.

Bipartite Network Modeling in Epidemiology
The epidemiology study on HFMD employing the BNM approach is made possible with the epidemiology triangle (ET) 25 that shows the three main components of any disease transmission: the infectious agent, host, and environment. It can be translated intuitively into a threenode graph. Since environment properties define a location, the environment component in an ET can logically be represented as the location component, which is a tangible component 24 . Similarly, the agent component in this study is the HFMD viruses, which stay on the surfaces of objects (at a location) for a long period. Thus, an agent component is inseparable from the location component in this study. Subsequently, the modified ET used in this study consists of two main components: location (L) and host (H) as depicted in Figure 1. The environment (V) component is enveloped within the location, which consists of the HFMD etiologic agents. The edge (E) that connects the L and H components is dependent on the virus survival duration on the surface of an object, signified by the time component placed above the edge shown in Figure 1.
The modified ET in Figure 1 instinctively represents a two-node graph. Both the L and H nodes have distinct features by nature. This is a bipartite graph that consists of two different types of nodes, implying the heterogeneity nature of the graph. It is taken as the basic building block for the HFMD bipartite network this study intends to model. Transmission of HFMD in this study is assumed to happen in the location component, where location acts as a medium for the virus to transmit from an infected patient to a new host. In this study, a graph refers to an unweighted bipartite graph for a visual representation of a bipartite network, while a network is a weighted bipartite graph that delivers both the topological and functional relationship between the bipartite nodes and their respective links. The weight here denotes the link weight which is a measure of affinity between the bipartite nodes of the network 24 .

Bipartite Network Modeling Methodology
Since the typical first stage of formalizing the feasibility of employing a BNM has been established by Kok et al. 24 , the methodology of this study involves four stages as presented in Figure 2, adapted from Liew et al. 26 . First, the formation of the bipartite graph structure of the HFMD transmission network; the locations and its hosts. Second, the formulation of the HFMD transmission network is coined as the Bipartite HFMD Contact (BHC) network, where parameters for the location and host nodes, and the link weight are quantified. Third, the implementation of a search algorithm onto the BHC network to rank and identify the HFMD hotspot. Lastly, the verification or validation, or both for the results obtained. Verification or validation of results

Study Area and Assumption
The study location is Kuching, Sarawak, Malaysia. The data used in this study are intended to depict the number of HFMD cases in Kuching, Sarawak from June to July 2019. It is generated based on the total HFMD cases recorded in Sarawak, from the first to the end of Epidemiology (EPID) Week 45, which was 47,900 cases 27 , as the record of HFMD cases in Kuching from June to July 2019 is not available. The data generation process used in this study is based on the process of the dengue study 24 . By averaging the 47,900 cases in Sarawak over the first 45 EPID weeks, it is assumed that there was an average of 1,000 HFMD cases in Kuching from June to July 2019. This is obtained by assuming 8 weeks from June to July of 2019 and considering 12 divisions in Sarawak, with Kuching having the highest population as the capital of Sarawak. As the dengue study where the bipartite dengue contact network is formulated consists of eight human nodes 24 , this study resorted to using 10 cases or 1% of the averaged HFMD cases, which is rounding the human nodes size in the dengue study to the nearest ten, to explore the applicability of BNM in this study. These 10 cases were assumed as patients of preschoolers and elementary schools' children because these are the most susceptible age group of the population reported 2,3,9 . The data related to HFMD patients, and the locations visited or related to the patients are not available either. This is because the real data of patients' health records in Malaysia are under the Personal Data Protection Act 2010 (PDPA) and are not accessible to the public. The patients' data in terms of their addresses and the locations they visited are generated in this study by identifying related places around Kuching city. For the exploration purpose of this study, 20 locations were determined in correspondence to the typical demography of the 10 patients assumed above. The number of locations is decided by taking the nearest ten of the number of location nodes in the dengue study whose bipartite dengue contact network formulated consists of 19 location nodes 24 . These locations include the actual residential houses, kindergartens, elementary schools, and other public places typically visited by the patients. Consequently, 10 distinct individuals and 20 unique locations are assumed as available in this study.
The date when the patient first shows symptoms is assumed within the two months (June and July 2019) time frame. The addresses used in this study are selected from within Kuching city using the Global Positioning System (GPS) coordinates obtained from Google Map. The edge in this study is formed when an HFMD patient visits a specific location. Using a random binary matrix generator, a 0-1 matrix of 20 (location) by 10 (human) dimensions is generated and taken as the link matrix, where 1 represents the existence of a link and 0 the absence of it. There are 23 links formed between the human nodes and the location nodes in this study.
The formulation of the bipartite HFMD network model is based upon the assumption that disease transmission follows the traditional epidemiology triangle 25 , which is then modified as shown in Figure 1. As the HFMD viruses are transmitted not only through human-tohuman contact but also via contact with exterior surfaces contaminated with the viruses, the location nodes in this study are selected places in Kuching, Sarawak and focus on the exterior's characteristics of the places.

Formation of Graph Structure
The basic building block shown in Figure 1 is used in this study where the host vertex is substituted as the human node (H), representing the HFMD patients in Kuching, Sarawak. The time parameter of an edge in Figure 1 portrays the possibility that transmission of the virus from a location to humans occurs only within a due amount of time. The 10 HFMD patients are denoted as 10 human nodes (H). The 20 locations that the patients have visited are denoted as the location node (L). The 23 links are denoted as the edge (E), where for example, an edge formed between L1 and H1 is labelled as L1H1. Using the abovementioned data, the bipartite graph structure termed the BHC graph of the BHC network is formed. It is defined as in Equation 1.

Formulation of Bipartite Network
The formulation of a bipartite network involves three processes: parameter quantification for the location and host or human nodes, and quantification of the link weight. Firstly, based on past studies, two categories of location node parameters are identified: the location physical parameters and the location-specific meteorological parameters, based on past studies. Hence, four parameters have been decided in this study for the location node based on the studies conducted before. They are the type of surface for objects in a location (St j ) 15 , temperature (T i:j ) 15 , humidity (K i:j ) 15 and the frequency of a location visited by a human (Fl i .) 24 . The surface types for the location are assumed based on the weather of the day. The surface types are considered dry and porous on a sunny day while they are considered non-dry and not porous if it rains. The weather in Kuching is obtained from AccuWeather (https://www.accuweather.com). As HFMD viruses stay longer outside on dry and nonporous surfaces, the surface types of a location visited by a human are given by Equation 2.
location i has dry and non-porous surface 0 location i does not have dry and non-porous surface (2) Visits of HFMD patients to a location directly affect the amount of HFMD viruses contended by the location. Thus, Fl i is included in the BHC network for location node i. It is given as in Equation 3 where Fh j:i is a parameter of the human node shown in Equation 4. The two environmental parameters of a location node are temperature (T i:j ) and humidity (K i:j ). The average temperature and humidity of a location for the past seven days right before a human was diagnosed with HFMD is taken.
Secondly, two parameters are determined for the human node. They are the total duration of stay in a location (Td) and the number of times a human visited the location (Fh). Td is the time recorded in minutes of the total duration of stay of a human in a location. Fh denotes the number of times a human node j visited a location node i and is given in Equation 4 where i = {1, 2, …, 20} and j = {1, 2, …, 10}.
Thirdly, the link weight is computed and named HFMD contact strength (HCS). It represents the link affinity between the human and location nodes. A stronger strength between the two bipartite nodes signifies more attachment between the location node and the specific human node. The summation rule is used to quantify HCS 24 and is defined in Equation 5. HCS is computed using the R programming language. Normalization for all the parameter values is conducted so that the respective numerical values are ranged between 0 and 1.

Ranking of Location Nodes
Lastly, ranking the location nodes and identifying the HFMD hotspot are carried out by implementing a ranking algorithm. Hypertext Induced Topic Selection (HITS) or the hub and authorities link analysis algorithm is used to rank the location nodes 28 . The HCS matrix obtained and the BHC network in the previous process serves as an input and the searching space to the HITS algorithm, adopting the power iteration method. Values of the resulting principal eigenvectors are procured as the measurement vector density termed the HFMD Hotspot Ranking (HHR) value. HHR is valued between 0 and 1 and is the ranking value used to rank the location nodes. A location with a high vector density is considered the reservoir of the HFMD viruses and people who visit the location will have higher chances of contracting the disease. The higher the vector density of a location, the higher it will be ranked, and the top-ranked locations could be identified as the hotspot of HFMD in this study.

Evaluation of Results
Two-step verification is conducted to verify the results obtained: benchmark verification and analytical verification. Benchmark verification is carried out using the Root Mean Square Error (RMSE) computed for the ranking values (HHR) acquired from the benchmark system and the BHC network. The RMSE threshold value of less than 0.05 is set to verify a model as acceptable 29 . Analytical verification using Spearman's Ranking Correlation Coefficient (SRCC) is applied to compare the ranking of location nodes between the hub (location) matrix and the HFMD hotspot ranking values (HHR). SRCC is suitable for the measurement of closeness for the ranking of small size networks 24 . The SRCC threshold value greater than 0.70 is set to verify a model, indicating a positive and high correlation 30 . This study resolves to consider the BHC network model verified only when it fulfils the above conditions where RMSE is less than 0.05 and SRCC is greater than 0.70. In the next section, the results obtained will be presented and discussed.

Results and Discussion
The implication of interpreting the results obtained in this study is subject to the assumptions and scope of the study. Using the quantified parameters, the respective normalized values of the parameters for the individual location nodes and human nodes are presented in Table 1. It shows the parameters' value of the location node i, which are temperature (T), humidity (K), and frequency of a location visited by a human (Fl); and the parameters' value of the human node j, which are a total duration of stay in a location (Td) and the number of times a human visited the location (Fh). The first row in Table 1 reveals that human node 1 visited location node 1. In this visit, the normalized value of the temperature (T) and humidity (K) of location node 1 are 0.8541 and 0.8183, frequency of location node 1 visited by human node 1 (Fl) is 0.9000. Besides that, the normalized value of the total duration of stay in location node 1 by human node 1 (Td) is 0.6375, whereas human node 1 visited location node 1 (Fh) 0.9000 time. The link weight (HCS) calculated from Equation 5 is given in Table  2, also the HCS matrix of the BHC Network. It shows the weight of 23 links formed between the bipartite nodes in the BHC network. The first row shows that the weight of the edge that linked location node 1 and human node 1 is 4.1099. With the location and human nodes parameters and the link weight quantified and the values computed, the resulting BHC network is presented in Figure 3. Figure 3 is the graphical representation of the BHC network. The values presented in each node are the normalized values of the parameters quantified for each node. For location node 1, the normalized value of the temperature, humidity, and frequency of visit at the location when human node 1 visited it (T 1:1 , K 1:1 and Fl) are 0.8541, 0.8183, and 0.9, and the link weight is 4.1099. As for human node 1, the normalized value of the duration of stay at location node 1 (Td 1:1 ), location node 8 (   The HHR generated for each location node is presented in Table 3. The location node L16 ranked the top among all the 20 location nodes because it has the highest HHR value. It implies that the infectious HFMD vector density is the highest in L16. The ranking of all the location nodes is given in Table 3 too. The benchmark system used in the benchmark verification is UCINET 6 31 . The ranking values for the location nodes produced by the benchmark system are the benchmarked HHR and are normalized accordingly. The resulting RMSE is 0.0005640, which is less than 0.05. For the analytical verification, the sum of the hub matrix is calculated and ranked. Then, it is compared with the ranking of the location nodes based on HHR using SRCC. The SRCC value obtained is 0.874, which is greater than 0.07. It shows that the HHR of the location node is positively and strongly correlated with the hub matrix for the location nodes. Both RMSE and SRCC computed have fulfilled the threshold value set, implying the verification of the BHC network model. The bipartite HFMD network model formulated using the assumed data managed to rank the location nodes and identify the hotspot of HFMD. The results show that the top six location nodes are L16, L8, L20, L1, L13, and L19, as presented in Table 3. The factors contributing to this ranking are the parameters of the location node and human node. From the basic building block (Figure 1), the crucial components to the identification of the HFMD hotspots lie in the parameters of the location and the mobility of the human. The parameters quantified and incorporated in this study show the importance of understanding the dynamic interactions between the biomedical and the social factors of the disease. The factors include the physical nature of a location, the biological nature of the disease, the demography of humans and the mobility or social context of humans. Besides the nature of HFMD diseases, and the rest of other factors are unique depending on the geographical location of the region of a country, which is closely related to the meteorological characteristics or its environmental features.
Similarly, one human is different from the other by nature and thus, it is unrealistic to assume the whole population is homogeneous in modeling the disease. A systemic view is needed to combat the HFMD outbreak, where it is viewed as a system, a real-world system. The bipartite network modeling approach shows its potential in abstracting the HFMD system by incorporating the distinct features of each location node and human node to capture the interaction between the locations and humans.
Appropriate measures and policies could be implemented by identifying the high potential locations in the transmission of the highly infectious HFMD diseases. Since there is no one-size-fit-all method in dealing with the HFMD outbreak, customized practices and decisions based on the geographical and social characteristics of the identified (location) hotspots could be executed accordingly by the public health authorities and related policymakers. By specifying the targeted locations (hotspots), it could help relieve the disease burden for children and the social pressure on the communities and optimize the costs incurred in dealing with the outbreak compared to the situations where public health measures are implemented to the whole area.

Conclusion
In this study, the bipartite network modeling approach has been employed to explore the modeling of a bipartite HFMD network and the identification of its hotspot, which is verified using benchmark and analytical verifications. The basic building block of the network consists of two nodes: the location node and the human node. The individual parameters of each node type are incorporated. The location node parameters comprise the type of surface for the objects in a location, the temperature and humidity of the location, and the frequency of visits by the human. The total duration of stay of a human at a location and the number of times the human visits the locations are the parameters decided for the human node. The weight of an edge that joins two types of nodes is quantified using the summation rule. Using the HITS algorithm, the location nodes are ranked where hotpots of the disease can be identified. By identifying the high potential locations for the transmission of highly infectious HFMD diseases, timely and effective measures and policies could be customized accordingly by the public health authorities and related policymakers. BNM is a potential approach in epidemiology study, particularly to identify potential hotspots for infectious disease. Possible future studies include further evaluation of the model using real data in confirming and validating the BNM approach presented and researching the possibility of other potential parameters in formulating the network model for HFMD.