Network science meets respiratory medicine for OSAS phenotyping and severity prediction

Obstructive sleep apnea syndrome (OSAS) is a common clinical condition. The way that OSAS risk factors associate and converge is not a random process. As such, defining OSAS phenotypes fosters personalized patient management and population screening. In this paper, we present a network-based observational, retrospective study on a cohort of 1,371 consecutive OSAS patients and 611 non-OSAS control patients in order to explore the risk factor associations and their correlation with OSAS comorbidities. To this end, we construct the Apnea Patients Network (APN) using patient compatibility relationships according to six objective parameters: age, gender, body mass index (BMI), blood pressure (BP), neck circumference (NC) and the Epworth sleepiness score (ESS). By running targeted network clustering algorithms, we identify eight patient phenotypes and corroborate them with the co-morbidity types. Also, by employing machine learning on the uncovered phenotypes, we derive a classification tree and introduce a computational framework which render the Sleep Apnea Syndrome Score (SASScore); our OSAS score is implemented as an easy-to-use, web-based computer program which requires less than one minute for processing one individual. Our evaluation, performed on a distinct validation database with 231 consecutive patients, reveals that OSAS prediction with SASScore has a significant specificity improvement (an increase of 234%) for only 8.2% sensitivity decrease in comparison with the state-of-the-art score STOP-BANG. The fact that SASScore has bigger specificity makes it appropriate for OSAS screening and risk prediction in big, general populations.


INTRODUCTION
Obstructive Sleep Apnea Syndrome (OSAS) is a serious clinical disorder caused by abnormal breathing pauses that occur during sleep; this results in sleep fragmentation and excessive daytime somnolence (Simon & Collop, 2012;Fischer et al., 2012;Lévy et al., 2014). There are studies reporting the epidemic incidence of OSAS, with worrying increasing rates over the last 20 years (Young, Peppard & Gottlieb, 2002;Punjabi, 2008; METHODS Network Medicine has received a lot of attention during the last decade (Menche et al., 2015;Goh et al., 2007;Barabási, 2007;Vidal, Cusick & Barabasi, 2011); this trend is fuelled by the fact that complex network science can bring significant advances in various medical fields like genomics (Sharma et al., 2013;Rozenblatt-Rosen et al., 2012), drug-target interaction (Yıldırım et al., 2007), or cell metabolism (Barabasi & Oltvai, 2004;Han, 2008). Consequently, it has been recently suggested that network medicine can be also used for addressing important problems in respiratory medicine (Faner et al., 2014;Divo et al., 2015).

Description of databases
In order to use complex network tools for OSAS research, we need real-world OSAS patient datasets. Unfortunately, OSAS patients datasets are scarce and not public; such a situation is justified by multiple aspects: big data techniques were only recently considered as tools for respiratory medicine and OSAS, all patients must undergo hospital polysomnography (which entails a complex, expensive and time-consuming process), while coordinated research efforts for gathering data were only recently introduced.
For instance, the biggest such OSAS database, namely European Sleep Apnea DAtabase-ESADA (Hedner et al., 2011), is not public and it has gathered data from 15,956 patients in 24 sleep centers from 16 countries since 2007. Also, a recent OSAS study (Marti-Soler et al., 2016) where the validation is similar to our approach uses only one (private) validation database, comprising 1,101 patients (Santos-Silva et al., 2012).
As a result, in order to perform network investigation on OSAS, we built our own Apnea Patients Database (APD) consisting of consecutive patients with suspicion of sleep breathing disorders, which were evaluated at Victor Babes Regional Hospital in Timisoara (Western Romania) between March 2005 and March 2012 under the supervision of the hospital's Ethics Committee (internal briefing note no. 10/12.10.2013). At the initial visit, the study protocol was clearly explained to obtain the patient's consent and the acceptance of referral physicians. Subsequently, respiratory polygraphy was performed using both Philips Respironics's Stardust polygraph (2005) and MAP's POLY-MESAM IV (1998). PSG was carried out with Philips Respironics' Alice 5 Diagnostic Sleep System, according to the appropriate guidelines (Rechtschaffen & Kales, 1968). The polygraphy was performed both at home and at the hospital, whereas PSG measurements were performed at the hospital under medical supervision. To preserve the information accuracy, all collected data were carefully verified; throughout this process, we have ensured complete data confidentiality. Our observational, retrospective study employs only standardized non-invasive procedures that exclude all useless investigations. Moreover, visits did not entail additional effort for the patients or supplemental budget for the clinic.
All 1,371 patients that completed the sleep study protocol and signed informed consent are included in the APD, each with corresponding 108 breathing parameters and anthropometric measurements. The APD distribution of measured AHI is presented in Fig. 1.
In order to verify if there is any difference between apnea and non-apnea populations in terms of how risk factors associate and converge, we built a 611 people non-OSAS database NAD (using the same procedure as for the APD). Also, to evaluate the prediction score derived from our study, we gathered a distinct test database TD (over a distinct period of time: the fall of 2013) consisting of 231 patients by following the same procedure. Figure 2 presents the distinct roles of our three databases, as well as the relationship between them.

Analysis of APD and TD
As patients within TD are used to validate our OSAS prediction with SAS Score , which was obtained by processing patients from APD, we analyse if the distribution of parameters in TD is not too close to the corresponding distributions in APD. Such an investigation is required considering that, although data for the two databases were gathered over distinct periods of time, all measurements were performed in a given geographical region.
To this end, we present the distributions of the most relevant parameters in our research (Age A, Body Mass Index BMI , Neck Circumference NC, High Blood Pressure HBP, and Epworth Sleepiness Score ESS) within the validation population (TD) and the apnea patients database (APD) in Table 1 under the form of measured averages and their corresponding standard deviations, as well as Gini coefficients. We rely on Gini coefficients for a quantitative measure of data dispersion.
We also provide a visual comparison of AHI and relevant risk factor parameters distributions in APD and TD (see Fig. 3). All these results show that data were randomly gathered, so that the main parameters are normally distributed. However, Gini coefficients (especially for A, BMI and ESS) indicate an important difference between APD and TD distributions. Moreover, Fig. 3 shows a significantly different AHI histogram for TD in comparison with APD. As such, in APD there are many patients with AHI >120, whereas and 2012, is used to build patient phenotypes and to render the SAS Score . The distinct Test Database (TD), comprising 231 consecutive patients which arrive at the hospital with suspicion of OSAS in 2013, is used to verify the sensitivity and specificity of predicting patient's AHI and OSAS categories. The Non-OSAS patients Database (NAD) uses consecutive assessed people whom are not diagnosed with OSAS during the spring 2015-summer 2016 period, in order to test for cluster consistency (i.e., compare how risk factors converge in clusters for OSAS patients in comparison with people without OSAS). in TD there is none such patient. Also, in APD the largest number of patients associated to an AHI value correspond to AHI values <20; in contrast, in TD, the largest number of patients with a given AHI value correspond to AHI values around 40.

Building the patient network
An unweighted network is a graph (V ,E), which consists of a set of vertices (or nodes) V and a set of edges (or links) E that represent connections [v,w] ∈ E between certain pairs of vertices v,w ∈ V . We build the unweighted Apnea Patients Network (APN), by assigning vertices and edges: each node corresponds to a distinct patient in our OSAS  patients database APD, while an edge (link) is created between two vertices if there is a risk factor compatibility between the patients represented by the two vertices (nodes). The risk factor compatibility is a binary function f RFC ∈ {0,1} (0 means incompatibility and 1 means compatibility) based on six parameters with high relevance for OSAS: age, gender, BMI, neck circumference, blood pressure (systolic and diastolic), and Epworth Sleepiness Score. We build our APN by considering that f RFC = 1 if at least four out of six parameters are identical; otherwise f RFC = 0.
The six parameters are selected from the pool of all relevant risk factors (all measured parameters can be found in the Supplemental Information 1 because they can be measured easily and objectively; such objective measurements can be performed anywhere, and are widely accepted in the medical literature (Lévy et al., 2014). In contrast, other scores consider snoring and witnessed apnea episodes as factors, but these are parameters which cannot be observed or measured objectively.
The reason for adopting the 4-out-of-6 criterion is that it assures the right amount of link density in the APN, meaning that there are enough links so that the APN is connected, but not too many links so that communities (i.e., clusters) can be rendered with energy model layouts (Noack, 2009). As Fig. 4 shows that the 4-out-of-6 link filtering represents the best alternative, we use this criterion to build the APN. To the best of our knowledge, this link filtering procedure is original and has not been used before in such network-based approaches.

APN clustering
We clustered the APN by using a dual clustering methodology: energy-model layouts plus modularity classes, similar to the approach from (Udrescu et al., 2016;Udrescu et al., 2014). Energy-models are force directed network layout algorithms, namely visual tools that assign certain positions in the Euclidian space to both nodes and edges (Noack, 2009). To this end, we used the Force Atlas 2 algorithm (Jacomy et al., 2014) as the network layout; this new layout is very effective in clustering various types of complex networks, as it is based on previous theoretical foundations of force directed attraction-repulsion algorithms  (Fruchterman & Reingold, 1991;Noack, 2003). Indeed, Force Atlas 2 is clustering complex networks by producing well-defined topological clusters. The overview of the entire clustering process, including testing cluster convergence with non-OSAS control patients and validation of SAS Score , is presented in Fig. 5.
For a network (V ,E), a layout algorithm running in an Euclidean k-dimensional space R k places each vertex v ∈ V to a corresponding position p v ∈ R k and assigns an Euclidean In particular, energy model layout algorithms are developed as attraction-repulsion (A-R) force systems (Noack, 2009). As such, in an A-R system, adjacent vertices attract while all the other pairs of vertices repulse; this is the emerging mechanism which leads to the formation of groups of vertices with dense connections that we interpret as communities or clusters. The A-R force values are proportional to the power (A or R) of the Euclidean distances between the nodes: the attraction between adjacent vertices v and w

and the repulsion between any 2 vertices
To generate topological communities that are consistent with connection densities, thus having the advantage of emphasizing distinct communities and clusters (Jacomy et al., 2014), attraction between two nodes has to decrease with the Euclidean distance between the nodes, while repulsion has to increase with the Euclidean distance, therefore we have A ≥ 0 and R ≤ 0; such representative forcebased layouts are the Fruchterman and Reingold model (A = 2, R = −1) (Fruchterman & Reingold, 1991), and the LinLog model (A = 0, R = −1) (Noack, 2003). For all A-R energy models, the resulted edge positions are determined by a local energy minima situation (Noack, 2009) (1) In addition to the layout algorithm, we used modularity-based network clustering (Girvan & Newman, 2002), a method that was proven to be effective in network medicine (Diez, Agustí & Wheelock, 2014;Faner et al., 2014). Network clustering consists of assigning each vertex v ∈ V to one of the disjoint vertex subsets (or clusters) C i , such that ∪ i C i = V . In our APN clustering approach, modularity classes C i are represented with distinct colors. Because the APN is an unweighted network, the modularity of any clustering is defined in Eq. (2), where |E C i | is the number of edges in cluster C i , |E| is the total number of edges in the network, d C i is the total degree 1 for nodes in cluster C i , and d is the total degree for all nodes in the network: ( Noack demonstrated that energy-model layout algorithms produce topological clusters that are equivalent with those rendered by modularity-based clustering (Noack, 2009). However, force-directed layouts provide additional topological information about clusters. As such, for a more accurate analysis, it is recommended that both modularity clustering and force directed layouts are used (Noack, 2009;Jacomy et al., 2014;Udrescu et al., 2016).

APN analysis
The APN representation resulted from our clustering methodology is presented in Fig. 6, where the distinct colors correspond to distinct modularity classes, and the well-defined topological clusters are explained accordingly. In Fig. 6, we interpret the eight topological clusters as distinct phenotypes, and provide the risk factors prevalence as percentages (L, Mi, Mo, Se)% for each such cluster/phenotype.

Non-OSAS Patients Network (NPN) analysis
Using the information from the 611 people non-OSAS database (NAD), we employ the same procedure as for the APN from Fig. 6. The NAD represents the control population, consisting of people that are not diagnosed with OSAS. The result of applying our methodology on NAD patients is presented in Fig. 7, where the colors correspond to distinct modularity classes; at the same time, topological communities rendered with the energy-model layout Force Atlas 2 are indicated and explained.
Upon visual inspection, Fig. 7 suggests that in the non-OSAS control population there are more patterns of risk factors association, which leads to a number of 12 topological Figure 6 Apnea Patients Network (APN) obtained with data from the Apnea Patients Database (APD), according to the risk factor compatibility relationship, using our dual network clustering methodology (i.e., modularity classes and energy-model layouts). The assigned colors correspond to modularity classes, and the 8 topological clusters are indicated. For each topological cluster, statistics are provided in red (as percentages) for all AHI risk groups: low, mild, moderate, and severe, using the format (L, Mi, Mo, Se)% (e.g., in Cluster 2 the patients are distributed on risk groups as follows: 9% L, 15% Mi, 19% Mo, 57% Se). The risk group classification is made with AHI values that are obtained by actually performing polysomnography (PSG) and polygraphy. clusters and modularity classes that are not correlated with OSAS or AHI risk groups. As such, according to our network-based methodology, it occurs that the six considered risk factors consistently converge only for the individuals with OSAS.

Description of phenotypes
In order to have a clear characterization of our rendered phenotypes, we are tracking the OSAS comorbidities (as recorded in the APD) within the APN. To this end, we consider the comorbidity types: cardiovascular (e.g., hypertension or stroke), nutritional (e.g., obesity or diabetes), and respiratory-related (e.g., COPD or asthma). Figure 8 presents the highlighted comorbidities within the APN by using distinct colors for comorbidity types that appear individually, as well as for comorbidity type overlaps (cardiovascular + nutritional, cardiovascular + respiratory, nutritional + respiratory, cardiovascular + nutritional + respiratory), and patients without known comorbidities. In light of comorbidity and AHI risk groups statistics provided in Table 2 for each cluster resulted from our network analysis (as illustrated in Figs. 6 and 8), we characterize the phenotypes as follows: • Phenotype 1: Mostly patients within the Se AHI risk group, which are generally obese males with thick neck, high blood pressure, sleepiness, and age between 40 and 60 years. For a large majority of these patients, all comorbidity types overlap.
• Phenotype 2: The large majority of these patients have Mo and Se apnea forms; they are obese females with thick neck, high blood pressure, sleepiness, age between 40 and 60 years. In this phenotype there are no patients with only respiratory comorbidities and only few of them have single nutritional comorbidities.
• Phenotype 3: The patients have mostly Mo and Se apnea, but there are less Se forms in comparison with other phenotypes; they are obese females with thin neck, high blood pressure, no sleepiness, and age between 40 and 60 years. This phenotype does not contain patients with only respiratory comorbidities.
• Phenotype 4: Mostly Se patients; however, there is a significant number of Mo individuals, which are generally obese males with thick neck, high blood pressure, Figure 8 The Apnea Patients Network (APN), with highlighted individual comorbidities and associations of comorbidities. The red nodes correspond to patients that have only cardiovascular comorbidities (C in A), the yellow nodes to only nutrition-related (N in B), and blue nodes represent OSAS patients with only respiratory comorbidities (R in C). Patients with overlapping comorbidity types are represented within the APN as follows: orange nodes correspond to cardiovascular plus nutritional comorbidities (C + N in D), purple nodes to cardiovascular plus respiratory comorbidities (C + R in E), green nodes to nutritional plus respiratory comorbidities (N + R in F), and black nodes to the superposition of cardiovascular, nutritional and respiratory comorbidities (C + N + R, G). The OSAS patients without known comorbidities are highlighted in the APN as white nodes (H in H). We also provide the APN where all nodes are labeled according to their comorbidity or comorbidities overlapping in I.
sleepiness, and over 60 years. In this phenotype only a few patients have the single respiratory comorbidities type.
• Phenotype 5: Mostly Se, Mo, and Mi patients, which are obese young males with thick neck, high blood pressure, no sleepiness, and age between 20 and 40. In this phenotype, Table 2 Description of the eight relevant Apnea Patients Network (APN) phenotypes. The phenotypes (Ph) are listed with a short description in terms of most or least predominant AHI risk groups (low risk-L, mild-Mi, moderate-Mo, and severe-Se), significant combinations of the 6 objective parameters, and most/least predominant comorbidity types (cardiovascular-C, nutritional-N , respiratory-R, and without comorbidities-H ) or comorbidity types overlaps (C + N , cardiovascular + nutritional; C + R, cardiovascular + respiratory, N + R, nutritional + respiratory; C + N + R, cardiovascular + nutritional + respiratory). For each phenotype, we provide the corresponding percentages for comorbidity types and comorbidity type associations, as well as the percentage of patients pertaining to one of the AHI risk groups; the boldface entries correspond to representative values, in terms of simple majority. In phenotype descriptions, HBP stands for high blood pressure. almost all patients have nutritional comorbidities or comorbidity overlaps that include the nutritional type.

Ph. Description Comorbidity types and associations (%) AHI risk groups (%)
• Phenotype 6: Consists of mostly Mo and Se apnea; the patients with this phenotype are generally obese males with thick neck, no high blood pressure, no sleepiness, and middle aged (40-60 years old). Their comorbidities are mostly nutritional-related (either single nutritional comorbidity or an association of comorbidities that contains the nutritional type).
• Phenotype 7: Patients with mostly Mo and Se apnea, but with less Se forms in comparison with other phenotypes; this phenotype's patients are generally males of all ages with thin neck, no high blood pressure, and no sleepiness. The majority of these patients have no comorbidities; however, those who have a comorbidity tend to have respiratory-related problems.
• Phenotype 8: Patients mostly within Se, Mo, and Mi AHI risk groups; they are males from all age groups with thin neck, no sleepiness, but with high blood pressure. These patients tend to have a single cardiovascular comorbidity type or an association of comorbidities that include the cardiovascular type.

OSAS risk prediction with SAS Score
Classifying any new patient in one of the phenotypes can be performed by adding the new patient to the APN and then running the modularity class and force-directed layout algorithms in Gephi one more time. However, during the screening process, physicians are frequently unable to perform these rather complex and time consuming steps (i.e., manipulation of databases and managing Gephi plugins), because of the obvious constraints.
In order to deal with this problem, we propose a simplified solution for classifying de novo patients using a computer algorithm that is implemented as a web-based application. As such, we employ supervised machine learning in order to classify any new person in one of the eight validated phenotypes, based on the six relevant parameters. To this end, we choose decision tree learning, because decision trees are easy to use, quick, and intuitive for medical personnel. We use R and available recursive tree partitioning libraries from R Studio in order to perform recursive tree partitioning, thus generating a phenotype classification tree (Therneau et al., 2010;Hothorn, Hornik & Zeileis, 2006).
Given the eight phenotypes determined with our network procedure, and the features extracted from them, we label each patient from the APD with the cluster/phenotype to which it pertains. From this point, we employ supervised learning methods and then test them in order to chose the tree which provides the highest reliability while maintaining a reasonable complexity.
The tested mining algorithms are: recursive partition tree, conditional inference tree, evolutionary tree, oblique tree, maptree, naive Bayes, random forest and linear regression (Therneau et al., 2010;Hothorn, Hornik & Zeileis, 2006;Grubinger, Zeileis & Pfeiffer, 2011). The algorithm test procedure relies on applying the resulted decision trees on de novo patients from the Test Database (TD). We take all patients in TD and assign each of them to a phenotype with the classification tree. In parallel, we add the TD patients to the APD, Figure 9 The phenotype classifying tree, obtained by running evtree on patients from APD. Each new person is assigned to one of the eight phenotypes, denoted as Ph1-Ph8 (dominant phenotypes are emphasized). Decisions are made according to some of the six relevant parameters (gender-G, age-A, body mass index-BMI , systolic blood pressure-SBP, diastolic blood pressure-DBP, Epworth sleepiness score-ESS), but also according to variables computed from the six parameters (High Blood Pressure-HBP, Neck Group-NG, Obesity-Ob), as described in the figure legend. and rerun the entire layout clustering procedure. Indeed, we obtain the same eight patient phenotypes, and we consider the phenotype labels that are assigned to patients in TD by our network clustering procedure as being the reference (i.e., correct phenotype assignation). Therefore, we quantify the efficiency of each decision tree by comparing with the reference represented by the network-based classification. The test results indicate the evolutionary tree (evtree) Grubinger, Zeileis & Pfeiffer (2011), which is given in Fig. 9, as the best method for our classification problem.
The classification tree in Fig. 9 uses the six relevant parameters (gender-G, age-A, body mass index-BMI , systolic blood pressure-SBP, diastolic blood pressure-DBP, Epworth sleepiness score-ESS), but also variables that are computed from the six parameters (High Blood Pressure-HBP,Neck Group-NG, Obesity-Ob) according to Eqs.
When we test the classification tree in Fig. 9 on patients from TD and compare the results against the reference classification (i.e., network-based), the resulted classification accuracy is 69.30%; the detailed prediction accuracy results for TD are given in Table 3.   In order to generate our SAS Score from the classification tree in Fig. 9, we follow the next sequence of steps: 1. Perform anthropometric measurements on each new patient. 2. Classify each patient in one of the eight phenotypes (using the classification from Fig. 9). 3. Refer to cluster normalization in order to get phenotype parameter averages from (6) SAS Score values are ≥1 and generally <7; this range of values can further serve to classify patients as being at risk of developing OSAS or not. A patient with SAS Score > threshold is considered at risk, whereas SAS Score ≤ threshold means that there is no OSAS risk. In Table 5 Average anthropometric measurements with standard deviations (expressed as percentages (%)) for the anthropometric parameters (Age A, BMI , Neck Circumference NC, High Blood Pressure HBP, Epworth Sleepiness Score ESS, Apnea-Hypopnea Index AHI ) in the APN clusters/phenotypes after 100 runs (for instance, the average age in cluster 2 is 54.11 years with a standard deviation of 3.51%). order to attain the main objective of our paper, namely to make population-wide OSAS monitoring and screening effective, we consider the higher specificity as being more important, so we choose threshold = 3.9, which determines a sensitivity of 0.8025 and a specificity of 0.4189. Taken together, these results obtained with the TD, consisting of de novo patients only, suggest that our SAS Score significantly outperforms STOP-BANG in terms of specificity (i.e., it is 2.34 times better), while remaining only slightly worse than STOP-BANG in terms of sensitivity (8.2% decrease). We consider these results as being particularly relevant, because the distribution of AHI in the TD is notably different from the distribution of AHI in APD (see Fig. 3 panels A and B).

Validation of clustering consistency
In this subsection, we verify that rendering the clusters in Fig. 6 is not mere serendipity, and it is not induced by some fortunate heterogeneity of patients. To this end, we perform random shuffling and bootstrapping test investigations. Because our clustering methodology starts with a random state, namely it starts with the raw network where nodes are randomly placed and the links have corresponding lengths (see Fig. 5), our first shuffling test consists of running the procedure in Fig. 5 many times in order to see if we get statistically consistent results. Therefore, we run our dual clustering procedure 100 times on the same APD, and then measure the distribution of anthropometric values for each phenotype. The result of our random shuffling is given in Table 5 which presents average anthropometric measurements with standard deviations (expressed as percentages) for the each APN cluster/phenotype after 100 runs. Indeed, the deviations from the average values are very small, emphasizing the consistency of our clustering procedure in Fig. 5.
Our second test approach entails generating test APDs from the original patient dataset, in order to perform bootstrapping. To do so, we generate 10 new APD datasets with the same number of patients as the original APD by randomly selecting patients from the original APD database. Therefore, in the test APDs, some of the original patients may be missing, while others may be present two or more times. Next, we apply the same clustering methodology from Fig. 5 and find that the same phenotypes emerge.
The characteristics of the original APN phenotypes from Fig. 6 are provided in Table 6;  Table 7 shows the averaged characteristics, for each cluster, over the randomized 10 APNs obtained by bootstrapping. In Table 7 distinct character types suggest how close (bold and normal characters) or how far (grey italics) are the phenotype characteristics resulted from bootstrapping from the original APN values.
The values from Table 7 indicate that the dominant characteristics of each cluster in the test APNs are very similar to the cluster characteristics from the original APN. As a result, even if we create new APDs with corresponding APNs by shuffling the patients from our dataset, the bootstrapping procedure yields the same phenotypes. The bootstrapping procedure reveals that some phenotypes emerge as more stable when randomized (e.g., clusters 1, 3, 4, 5), with a small offset compared to the original APN. The lack of convergence over multiple randomizations could be interpreted as a lack of representativeness as an OSAS phenotype. As such, clusters 2 and 6 are slightly less representative, and clusters 7 and 8 are notably variable, suggesting that the last two phenotypes may not be so well characterized, due to the reduced amount of patient data. We also create a test patient network (TPN), similar to building the APN, and then apply our dual clustering methodology. The result is presented in Fig. 10; upon visual inspection it can be noticed that the clusters emerged in TPN are similar to the clusters from Fig. 6's APN, even if the number of patients is significantly smaller in TD; this result suggests that the association and convergence of risk factors in OSAS patients is indeed a non-random, consistent process.

DISCUSSION
The proposed method is not the first to cluster apnea patients (Joosten et al., 2012;Vavougios et al., 2016;Ye et al., 2014), but to the best of our knowledge it is the first network-based approach used for clustering apnea patients. Another important feature is that our networkbased methodology employs only easy-to-measure, objective clustering parameters (AHI is used only for phenotype evaluation). This way, our clustering methodology emphasizes the high complexity of OSAS phenotypes, from typical (cluster 1-yellow) to the less obvious ones (clusters 6-8).
When defining the eight apnea phenotypes, besides the force-directed layout, we also use modularity class clustering. In Fig. 6, the phenotypes based on modularity classes are generally consistent with the topological clusters resulted from applying the Force Atlas 2 layout. However, the visual inspection of Fig. 6 reveals that phenotypes 5, 6, 7, and 8 tend to spatially overlap; this tendency is much stronger for phenotypes 6, 7, and 8. Such a tendency for overlapping phenotypes that characterize patients with generally mild and moderate OSAS is also suggested by Joosten et al. (2012). This observation may indicate that these phenotypes are interrelated and generally hard to distinguish even in clinical practice. Still, some of the nodes in these clusters (e.g., cluster 8) have a clear tendency towards separation from the overlapping; this indicates that we probably need more patients/nodes, in order to completely segregate Cluster 8. As our APN will grow over time, the less convergent clusters 7 and 8 might become more representative. Indeed, the fact that even in the original APN from Fig. 6 clusters 7 and 8 present significant overlapping and that their topological segregation from other clusters is somehow fuzzy confirms the conclusion of our bootstrapping investigation. Nonetheless, we preferred to use the distinct modularity classes in conjunction with the topological clusters because they bring more information, i.e., more detail which can be useful for medical analysis.
From a medical standpoint, we note that our dual clustering method renders distinct male and female clusters; this observation is consistent with the state of the art medical literature which holds gender as a very important predictor of OSAS. For instance, in a 2009-2013 study on 272,705 patients from North America, referred for home sleep apnea testing, clinical OSAS features are found to be more common in males than females (Cairns, Poulos & Bogan, 2016). Other studies performed on 23,806 (Gabbay & Lavie, 2012), and 1,010 (Vagiakis et al., 2006) patients respectively, show clear differentiation between the two genders in terms of AHI distribution and severity.
In current practice, the commonly used score for predicting sleep apnea is STOP-BANG. In comparison with STOP-BANG, our SAS Score significantly improves the prediction specificity (2.34 times better than STOP-BANG), while sensitivity is only slightly degraded. STOP-BANG has high sensitivity because it is a simple heuristic that was especially designed for perioperative patients, where it is essential to identify all potential risks associated with anaesthesia (including OSAS). To further emphasize the higher specificity of SAS Score , we mention that by using our score, only 34% people from NAD are found as at risk of developing OSAS. Moreover, as opposed to STOP-BANG (a fixed questionnaire that cannot be adjusted to specific patients characteristics), SAS Score represents an adaptive methodology. Therefore, as the database grows, better sensitivity and specificity are expected. The classifying tree which leads to rendering SAS Score , as described in section OSAS risk prediction with SAS Score represents a simplified application of our patient clustering/phenotyping method, but this method has the advantage of being applicable in offline conditions, which makes it amenable to clinical practice and population screening. All these considerations indicate SAS Score as an appropriate tool for OSAS screening in large general populations. Our network-based method represents an application on patients from a given geographical area, therefore we consider that it should be tested for other targeted populations. As such, the network analysis will render new, specific cluster average values (such as BMI Cluster a , NC Cluster a , SBP Cluster a , DBP Cluster a , ESS Cluster a ). Subsequently, SAS Score values that are specific to the targeted population can be rendered with the SAS Score equation. However, we do not expect that the phenotypes or SAS Score will be significantly different for other populations, since the available medical studies, performed over diverse geographical areas (including a wide array of anthropometric characteristics), show that specific population traits are not particularly relevant for OSAS (Ralls & Grigg-Damberger, 2012;Lee et al., 2008;Villaneuva et al., 2005).
Eventually, due to its higher specificity, the SAS Score can be integrated into a large area apnea screening and monitoring procedure, which aims at specifically discovering typical severe cases (easy to investigate with portable devices), without overcrowding sleep laboratories with false positive cases. This way, efficient personalized patient processing can be achieved by making use of prioritization according to the predicted severity level. For instance, this method can be a useful tool for sleep apnea screening in large population categories, such as professional drivers since, at the European level, the new 2014/85/EU directive regarding professional drivers is recommended from January 2016. 2 In this context, our website http://sasscore.appspot.com is a good example of a large-area, accessible OSAS risk prediction tool. Indeed, SAS Score can be conveniently computed in both clinical and population-monitoring practices, due to the fact that it is implemented as easy-to-use smartphone and web-based applications (https://play.google.com/store/apps/details?id=aerscore.topindustries.aerscore&hl=en and www.pneumoresearch.ro). To this end, processing data and obtaining the prediction score requires less than 1 min per individual.

CONCLUSION
This paper proposes a new OSAS patients clustering method based on complex network analysis, which leads to identifying OSAS phenotypes. This innovative network medicine approach is extended in order to compute SAS Score , a predictive score for OSAS based on 6 easy-to-measure, objective parameters. The proposed method uses big data, complex networks analysis in order to achieve better specificity in OSAS prediction. As such, our SAS Score can conveniently be used in conjunction with the existing questionnaires for better OSAS prevention through population screening and monitoring, thus paving the way for a personalized patient management process.