Benchmark dataset for the Voronoi diagram of 3D spherical balls

In this paper, we present a dataset to be used for the construction of the Voronoi diagram of 3D spherical balls (VD-B3). The dataset consists of sphere arrangements including general, anomaly, and extreme cases. The dataset also includes protein models downloaded from RCSB Protein Data Bank (PDB). The dataset can be used as a standard benchmark dataset to verify and validate the correctness, efficiency, and robustness of the construction algorithm. The dataset is simple and easy to understand. The details of the experiment and analysis based on this dataset are presented in the original research article: “Robust Construction of the Voronoi Diagram of Spherical Balls in the Three-Dimensional Space” which introduces the topology-oriented incremental algorithm for the construction that is thoroughly validated and compared with two implementations of the well known edge-tracing algorithm.


Specifications
Geometry and Topology Specific subject area Computational geometry Type of data 1) Text files (Each file represents an arrangement of 3D spherical balls) 2) PDB files (Each file represents a protein model that consists of 3D spherical balls) How the data were acquired 1) Text files are generated by a C ++ program on Intel® Core TM i7-7700 3.60GHz, 16.GB RAM with Window 10 operating system. 2) PDB files are downloaded from RCSB Protein Data Bank. Data format 1) Text files: Raw data 2) PDB files: Curated data Description of data collection • Sets of random 3D spherical balls within a spherical container: BALLCLOUD, BALL SMALL SET [1,10], VISUAL SET (Vis-I-10). Hereafter a ball denotes a 3D spherical one.

Value of the Data
• The Voronoi diagram of 3D spherical balls (VD-B3) include the analysis and the design of biomolecular structures, e.g., proteins and material structures. The predictions of collisions between drones, airplanes and satellites are also emerging applications. Despite of its importance for solving such diverse applications, the robust construction of VD-B3 is an extreme challenge and has long been a hot research topic. A standard benchmark dataset is therefore desirable or necessary for researchers. • The dataset contains general cases (without an anomaly), anomaly cases, and extreme cases as well as protein models for validating the correctness, efficiency, and robustness of the algorithm to construct VD-B3. • Researchers who develop and/or evaluate the algorithm of VD-B3 can benefit from the dataset. • The dataset can be used as a benchmark dataset to compare the performance of VD-B3 algorithms.

Data Description
Here, we present a dataset to be used for the construction of the Voronoi diagram of 3D spherical balls (VD-B3). The dataset consists of six types of file collection: BALLCLOUD, BALLS-MALLSET [1,10], EXTREMESET, ANOMALYSET, PROTEINSET, and VISUALSET. VISUALSET is for the visual check of constrcuted Voronoi diagrams and the others are for computational experiments. PROTEINSET consists of PDB files describing the 3D structures of biomolecules. Except those in PROTEINSET, balls do not intersect each other. The details of experiments and analysis using this dataset can be found in the original research article [1] . This dataset can be downloaded from our Mendeley Data repository [2] . Hereafter, a ball denotes a 3D spherical one. [1,2], BALL [1,5] small has 1,0 0 0 * j balls whose radii are from [1,10]. Each file Ball j small is named as 'BALL_SMALL_j0 0 0.txt' and stored in 'BALL SMALL SET' folder of the repository.

4) ANOMALYSET
ANOMALYSET = {ANO1, ANO2, ANO3, ANO4} is a set of four files which contains anomaly cases in VD-B3. In ANO1, an arrangement of two large balls and a small ball between the two large ones exists to define an elliptic Voronoi edge (V-edge) e which has no Voronoi vertex (V-vertex). As a consequence, e is not connected to any V-edge in the big-world. In this sense, this case is called "0-connected". The e constitutes a small-world on its own. In ANO2, ANO1 case occurs twice in a nested fashion. In ANO3, there are three tiny balls between two large balls. Their arrangement forms a small-world which consists of four Vvertices. This case is called "3-connected" because two V-vertices among the four ones are connected with three V-edges, if they are. In ANO4, the two V-vertices in the small-world are connected to each other with all four available V-edges. Hence, this case is called "4connected". For details of anomalies, see [3] . Each file is stored in 'ANOMALYSET' folder of the repository.

5) PROTEINSET
PROTEINSET consists of 20 protein models downloaded from RCSB PDB [4] where the protein models consist of atomic coordinates calculated using the Fourier transformation of electron density maps of a protein crystal [5] . Table 1 shows the 20 protein models and the number of atoms which constitute the models.

Experimental Design, Materials and Methods
For BALLCLOUD, BALL SMALL SET [1,10], EXTREMESET, ANOMALYSET, VISUALSET, each file is created by the following generation rules. All codes are written in C ++ language and run in Window 10 operating system. 1) Sets of random balls within a spherical container: BALLCLOUD, BALL SMALL SET [1,10], VISU-ALSET (Vis-I-10) The radius r c of the spherical container C centered at origin is calculated by the Eqs. (1) and (2) where N, r min , r max , and ρ ( = 0 . 1 ) are the number of balls, the minimum / the maximum radius of balls, and expected packing ratio, respectively. This idea is a 3D version of a random disk generation rule [6] . The radius set R is only used to calculate the radius r c of the container. Each ball is randomly generated in an axis-aligned bounding box of the container C. That is, a center point ( x, y ) and radius r of a ball is picked from independent uniform distributions on [ -r c , r c ], [ -r c , r c ] and [ r min , r max ], respectively. If a ball is both completely in the container C and intersection-free from other balls, it is chosen. If not, a new randomly positioned ball with the same radius r is generated and tested again.
2) Sets of random balls touching a reference sphere: EXTREMESET, VISUALSET (Vis-V-20, Vis-VI-6, Vis-VII-20) Assume that the reference sphere ( RS ) is centered at origin O and has radius r s . To generate a center point P of a ball tangent to RS from outside, a method to pick a random point Q on the surface of a unit sphere is used [7] . The center point P of a ball with radius r is generated by the Eq. (3) with the picked point Q . If a ball is intersection-free from other balls, it is chosen. If not, a new random ball with the same radius r is generated and tested again. Two consecutive chosen balls centered at P 1 and P 2 is reinforced to satisfy Each file is generated manually to represent each anomaly case.

Ethics Statements
This work did not include work involved with human subjects, animal experiments or data collected from social media platforms.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.