A generalized uniform placement of alters on spherical surface (U-PASS) for visualizing general weighted networks

Network visualization is an important tool for extracting information from the structure and configuration of a network, especially when the network includes weighted edges and nodes with attribute information. Previous studies have demonstrated an effective visualization method that projects the network onto a spherical surface. In this work, we extend this method, known as Uniform Placement of Alters on Spherical Surface (U‑PASS), to general weighted networks. This extension enables the uniform distribu‑ tion of network nodes across multiple layers of concentric spheres. In addition, we dis‑ cuss the lower bound of a criterion called generalized spherical cap discrepancy, which is used to evaluate the uniformity of node distribution on a collapsed spherical surface.


Introduction
In contemporary society, networks have become indispensable, playing pivotal roles in both daily activities and various academic disciplines.Their significance lies not only in connectivity but also in their capacity to visualize intricate relationships.This burgeoning field enhances intuitive comprehension of complex network structures, thereby enriching academic research and impacting diverse fields like business, science, and sociology.As network dimensions expand in the era of big data, the need for innovative visualization methods becomes increasingly urgent to address the challenges inherent in large-scale network analysis.
Various methods for network visualization have been developed, but their applicability varies depending on the specific insights sought from the network structure.This article focuses on the challenge of efficiently identifying nodes related to a specific node in a weighted edge network and evaluating the strength of these connections.Our proposed visualization method aims to address this issue, particularly benefiting networks such as patent and citation networks where understanding node relationships is crucial.Current visualization methods, exemplified in Fig. 1 (left), allow node observation but often fail to effectively convey the strength of correlations between nodes.Even approaches like Fig. 1 (right), which use edge thickness to denote correlation strength, do not facilitate quick identification of nodes related to the target node.Hence, new visualization techniques are needed to overcome these limitations and improve the clarity and efficiency of extracting relevant information from network structures.
One significant limitation hindering Fig. 1's effectiveness in displaying network structure is the constraint of display space.Expanding network visualization into threedimensional (3D) space, as opposed to two-dimensional (2D) planes, can substantially enhance information conveyance.Current efforts to represent networks in 3D employ two primary approaches.The first involves combining multiple 2D plane networks to create a multilayer structure that illustrates correlations between different layers.For instance, Ahmed et al. (2005) assigned nodes to corresponding planes based on node degrees across layers, applied notably in protein-protein interactions, IEEE InfoVis citations, and collaboration networks.Additionally, Škrlj et al. (2019) developed a Python suite capable of graphically laying out multilayered networks, comparing its efficacy with solutions in other programming languages.However, these methods commonly utilize force-directed algorithms for node allocation, balancing attractive and repulsive forces between nodes but occasionally encountering node overlap issues.Moreover, they typically assume network relationships as simple connections without considering variations in edge strengths.
Another approach gaining traction is to represent networks on the surface of a sphere, a burgeoning research topic with diverse algorithms developed to tackle this challenge.In the realm of distributed networks on spheres, two primary methods prevail.The first method continues to rely on force-directed algorithms.For example, Schulz (2018) proposed an algorithm to lay out networks on a sphere, projecting them into lower dimensions to assess distances from a selected focal node to others.Among these algorithms, Doubly Stochastic Neighbor Embedding on Spheres (DOSNES) (Lu et al. 2019) represents a novel method based on stochastic neighborhoods, effectively normalizing node distribution amidst highly imbalanced data.The second method employs the self-organizing map (SOM) algorithm, an unsupervised neural network technique designed to project high-dimensional data into a two-dimensional space while preserving original data topology.Fu et al. (2007) extended SOM to distribute email networks on surfaces, yielding small-world networks but with challenges such as node and edge overlap, addressed  Lee et al. (2016) in part by Wu and Takatsuka (2006)'s two-stage SOM (SOM2) algorithm, enhancing clarity albeit at the cost of computational efficiency.
Despite these advancements, a persistent issue remains: node overlap, complicating structural network analysis.A promising solution involves increasing distance between node pairs proportionate to their closeness, akin to minimum energy design principles in experimental design.Building on this concept, our recent study (Huang and Phoa 2023) introduced Uniform Placement of Alters on Spherical Surface (U-PASS), tailored for visualizing ego-centric networks on a spherical surface.Ego-centric networks, characterized by a central node connected to all others, are uniformly distributed across a sphere using a three-step algorithm that maximizes minimum distances between nodes, optimized through particle swarm optimization (PSO).However, real-world network data often transcends ego-centric structures.Allocating all network nodes on a sphere may insufficiently represent network structure, prompting extension to multi-layer spaces, such as the two-layer concentric sphere proposed in Huang and Phoa (2024).Here, nodes connected to a randomly selected ego populate the first layer, while unconnected nodes populate the second.
Central to U-PASS is the imperative of uniform node distribution, linked to minimum energy design (MED) principles as discussed in Huang and Phoa (2023).Evaluating uniformity across methods commonly employs criteria like discrepancy, measuring deviations between empirical and theoretical distributions, critical for validating existing designs and searching for uniform solutions.Various criteria, from star discrepancy to Lee discrepancy, were comprehensively explored in Fang et al. (2018), yet standardized criteria for network node distribution remain lacking.
When considering a domain on a unit sphere S 3 = z ∈ R 3 : �z� = 1 , traditional cri- teria are no longer directly applicable.Introduced by Aistleitner et al. (2012), the spherical cap discrepancy (SCD) offers a natural criterion for assessing uniformity: where x k ∈ D , represents an experimental design of N points on the sphere, C(w, h) denotes a spherical cap with center w and height h, and σ * (C(w, h)) = σ (C(w,h)) σ (S 3 ) normalizes the surface area of C(w, h). he best-known lower bound of SCD C N (D) , established in Beck (1984), is for any N-element set D ⊂ S d , where c(d) is a constant dependent only on d.While orig- inally designed for independent points, the focus shifts to network nodes in our context, which exhibit dependencies rather than independence.Therefore, relying solely on spherical cap discrepancy proves inadequate for evaluating uniformity within a network on a spherical surface.Recognizing this, Huang and Phoa (2024) introduced the generalized spherical cap discrepancy (gSCD) criterion to address these complexities.This criterion accounts for varying node degrees where nodes with larger degrees typically occupy smaller solid angles, akin to independent points with less weight.The formulation of gSCD is: where G i = N .However, the statistical properties of gSCD have yet to be fully explored.Therefore, this work aims to investigate its lower bounds comprehensively.
In this work, we propose an extended algorithm derived from U-PASS to uniformly distribute network nodes on a multi-layer concentric sphere, applicable to general networks.Moreover, we introduce a novel criterion to evaluate the uniformity of network nodes on a spherical surface.The notations and definitions used in this study are detailed in Sect. 2. Section 3 outlines our method and discusses the lower bounds of the generalized spherical cap discrepancy.Simulations, result comparisons, and discussions are presented in Sects.4 and 5, respectively.

Notations and definitions
We use notation similar to Huang and Phoa (2023), with slight modifications.We consider an undirected network G(V, E), where V is the set of nodes and E the set of edges.One node, denoted as v 00 , is designated as the ego node.Let A be the In practice, nodes v d ij and clusters C i are represented as spherical caps, where each node v d ij resides at the apex characterized by polar angle θ j i and solid angle . Similarly, each cluster C i is characterized by solid angle i .To define optimality in uniform node allocation, we first measure the distance φ jj ′ between two points v d ij and v d ij ′ on a unit sphere centered at v 00 , defined as Uniform allocation is deemed optimal if it maximizes the minimum φ jj ′ value across all nodes v d ij and clusters C i in G(V, E).Note that adjustments are made to the distance between node pairs within a cluster based on the cluster's edge density.The conventional Beta index β = total number of edges total number of nodes poses a zero-denominator problem for a single point, leading us to define an adjusted Beta index β i = e i +c i c i to quantify the connectivity degree of C i .

Three-stage optimization method
We consider an arbitrary network and select a node of interest as the ego, while all other nodes are treated as alters.Using a pre-determined similarity matrix, we determine the degree of similarity of all alters to the ego.Based on these similarity degrees, we represent the network as an egocentric multi-layer concentric sphere network, assigning nodes to layers according to their similarity to the ego.Nodes with higher similarity are positioned closer to the ego on smaller radius spheres, while nodes with lower similarity are placed on larger radius spheres.The radius of concentric spheres is proportionally determined by the reciprocal of the degree of similarity.Each node v d ij is placed on a layer with a radius determined by d = 1 . This approach ensures a more accurate representation of the relationships among network nodes, facilitating insightful visualizations.This visualization method resembles an ego-centric network and allows modifications to the U-PASS algorithm from Huang and Phoa (2023) to accommodate this more complex network structure.
To ensure optimal visibility of node distribution within the sphere network, we aim for uniform distribution, whether nodes are on their own layers or projected onto the same layer.Initially, all nodes are evenly distributed on the same layer to prevent overlap, then projected one by one onto their corresponding layers.The algorithm is listed in 1, and the details are described below.
Step 1: Allocation of Cluster Center.This step mirrors Step 1 of Huang and Phoa (2023).Each cluster is treated as a spherical cap with a distinct solid angle.The size of the polar angle of each spherical cap is determined by the number of nodes and edges in the cluster, expressed as θ i = cos −1 (1 − 2r i ) where r i represents the weight propor- tion of the i-th scatter node.The weight function is defined as denotes the angle between the cluster centers p i and p i ′ , aiming to maximize the angle between the spherical caps.Figure 2 (top left) illustrates the result of allocating the two cluster centers.
Step 2: Alter Allocation within Each Cluster.Nodes within each cluster are positioned on different layers based on their similarities to the ego.Initially, all nodes are considered collectively and assumed to be divided into M communities.Each community's placement range is defined as a circular sector m, with an angle of w m ) −1 as shown in Fig. 2 (top right), where w m represents the weight function introduced in Step 1.The outermost nodes are uniformly positioned on the unit sphere within each sector, followed by the allocation of inner nodes to the remaining space, ensuring non-overlapping positions.This uniform distribution process maximizes the angle min j =j ′ φ jj ′ between pairs of nodes.Finally, the distances between the nodes and the ego are adjusted, and nodes are projected back onto the corresponding sphere.
Step 3: Allocation of Remaining Degree-1 Alters.This step closely resembles Step 3 in Huang and Phoa (2023).All nodes that do not belong to any cluster are distributed to the remaining space of the sphere by maximizing the minimum distance between pairs of nodes.Here, we also consider the similarity between alters and the ego.After determining the positions of the nodes, they are projected back onto the sphere where they should exist.Figure 2 (bottom) presents the diagram where all degree-1 alters have the same similarity.

Generalized spherical cap discrepancy and its lower bound
We reviewed in the introduction that SCD (Spherical Cap Discrepancy) is a widely used criterion for evaluating the uniformity of point distributions on a sphere, and gSCD (generalized SCD) is a new criterion that accounts for the degree of network nodes.Uniformity is critical in the U-PASS method, as it allows for numerical comparison of the uniformity of results obtained by different methods, rather than relying solely on visual inspection.Therefore, it is essential to discuss the nature of this criterion.
In this section, we aim to discuss the lower bound of gSCD.Understanding the lower bound of gSCD is significant because it represents the optimal uniformity of node allocation.By knowing this lower bound, we can determine how far our results are from the optimal outcome.Unlike SCD, gSCD additionally considers the weight of nodes.Since the distribution position of nodes with different weights significantly impacts gSCD, it is not straightforward to directly derive the lower bound of gSCD.However, from a mathematical standpoint, it is reasonable to assume that gSCD should be larger than the value of SCD.Therefore, SCD can serve as a valuable reference point for estimating the lower bound of gSCD.In the simulation, we compared the gSCD of U-PASS (Huang and Phoa 2023) with other approaches mentioned in Huang andPhoa (2023), including SOM (Fu et al. 2007), two-step SOM (SOM2) (Wu and Takatsuka 2006), Christian Schulzâ€ ™ s method (CS) (Schulz 2018), and DOSNES (Lu et al. 2019).
The simulation initially focused on a network containing 50 nodes, with detailed network structure provided in Appendix A. Following a preferential attachment model, the network was expanded incrementally by adding 50 nodes in each step, resulting in calculations for 10 distinct network sizes.The gSCD was computed for each network size using five different methods, and the results are presented in Fig. 3.
As illustrated in the figure, both gSCD and SCD exhibit a declining trend as the number of nodes increases, with comparable rates of decrease.Remarkably, U-PASS consistently demonstrates a lower gSCD, indicating a more uniform distribution of nodes.The advantages of U-PASS are particularly pronounced when the number of samples is small.

Real data example
In this section, we use real data to demonstrate the visualization capabilities of U-PASS.The dataset comes from the Department of Intellectual Property and Technology Transfer of Academia Sinica's website, containing 1,094 patents in various fields.For each patent, it includes attributes and a brief abstract describing its purpose and function.Typically, patents can be clustered by attributes such as the developing institute.However, the relationships between these patents have become more complex, making traditional labels less useful for describing cross-field patents and often meaningless for visualization purposes.
To address this challenge, we use the Doc2vec method, a robust technique for converting text to vectors based on each patent's abstract.We then construct pairwise similarities between patents by computing the Euclidean distance.A heuristic clustering method is applied to the similarity matrix, and U-PASS is used to visualize the relationships between patents.This approach differs from traditional clustering methods that rely solely on predefined attributes, focusing more on the patents themselves.The U-PASS algorithm facilitates easier analysis of the network and helps uncover insights about the patents.The main objective of this study is to enhance the accessibility and practical utility of patent datasets through meticulous and insightful visual representation.
In subsequent stages, we select a patent of specific interest as the ego node.The remaining patent nodes are allocated to different layers based on their similarity to the In conclusion, the layer structure of the graph layout vividly illustrates the correlation between alter nodes and the selected ego patent node.This graphical representation not only enhances understanding of the intricate relationships within a patent dataset but also simplifies the identification of patents closest to the ego patent by exploiting discernible distance differences in the visual representation.Overall, this multifaceted approach facilitates a more detailed exploration and interpretation of patent datasets, contributing to a deeper understanding of underlying relationships and patterns.

Fig. 1 (
Fig. 1 (Left)The technology network.Yan and Luo (2016) (Right) The patent network.Lee et al. (2016) The network contains k clusters, {C 1 , . . ., C k } , each with sizes |C i | = c i for i = 1, . . ., k. Nodes within each cluster at distance d from the ego are denoted v d ij for i = 1, . . ., k and j = 1, . . ., c i .Nodes connected to the ego node at distance d, excluding those within clusters, are denoted v d 0j for j = 1, . . ., N , where N = |V | − i c i − 1 .E i repre- sents the set of edges in cluster C i with size |E i | = e i for i = 1, . . ., k .The total number of nodes |V| satisfies |V | = d=1 |V d | + 1 , where V d is the set of nodes at distance d from the ego.
We focus on the network structure under varying edge weights, defining a |V | × |V | simi- larity matrix S, where S v d is v l jt rindicates the degree of dependence between network nodes, satisfying 0