Benchmark dataset for the convex hull of 2D disks

In this paper, we present a benchmark dataset which can be used to evaluate the algorithms to construct the convex hull of 2D disks. The dataset contains disk arrangements including general and extremely biased cases, which are generated by a C++ program. The dataset is related to an article: “QuickhullDisk: A Faster Convex Hull Algorithm for Disks” in which the QuickhullDisk algorithm is presented and compared to the incremental algorithm which was reported by Devillers and Golin in 1995 [1].


Data
Convex hull is one of the most fundamental constructs in geometry and many algorithms to construct the convex hull have been extensively studied. Here, we present a benchmark dataset for the convex hull of 2D disks, which contains four types of disk arrangements spanning from general to extremely biased cases.

1) RANDOM
A set RANDOM ¼ {D ij ji, j ¼ 1, 2, …, 10} where D ij contains 10, 000 * i random disks and represents a data file. Thus, there are ten different files with the same number of disks. We randomly placed the disks of D ij within a circular container. The container center is at the origin which has a sufficiently large radius so that the packing ratio r is maintained approximately 0.1 unless otherwise stated. Note that r is the ratio of the union of the area of individual disks to the area of the container. In RANDOM (and the other two test sets ON-BNDRY and MIXED), each disk has a random radius r2[1.0, 10.0], two disks may intersect each other, and one may include another.
Specifications Table   Subject Computational Mathematics Specific subject area Computational geometry Type of data Text files (Each file represents an arrangement of disks) How data were acquired Generated by a Cþþ program on Intel® Core™ i7-7700 3.60GHz, 16.GB RAM with Window 10 operating system.

Value of the Data
The dataset can be used to evaluate the computation time of the algorithms to construct the convex hull of 2D disks. The dataset can be useful to researchers in computational geometry community. The dataset enables to compare the solution quality and robustness of the algorithms for the convex hull of 2D disks in diverse cases where D i contains 10, 000 * i disks touching a circular container from inside.

3) MIXED
A set generated by hybrid of RANDOM and ON-BNDRY: MIXED ¼ {D ij ji, j ¼ 1, 2, …, 10} where D ij contains 1000 * i * j disks (out of 10, 000 * j disks) touching a circular container from inside. The remaining 1000 * (10 À i) * j disks not touching the container are randomly positioned within the container. Hence, a container touching ratio g is well defined as the ratio of the number of disks touching the container to the total number of the disks.

4) ON-A-LINE
A set of congruent disks centered on a line (known as a sausage configuration [2]): ON-A- where D i contains 1000 * i disks on the linear grid from left to right. Each disk d 2 D i has a unit radius (1.0) and the distance between the boundaries of two neighbour disks is kept 0.5. Fig. 1 [1] shows examples of the four types of disk arrangements with reduced number of disks.

File format
For RANDOM, ON-BNDRY, ON-A-LINE, same file name convention is applied. Each file is named as 'Nx.txt' where 'x' denotes the total number of disks in the file. For example, a file 'N10000.txt' of RANDOM has 10,000 random disks. The file name of MIXED type includes the percentage of disks touching a container as well as total number of disks. A file 'N10000_10.txt' in MIXED includes 10% of the total 10,000 disks as disks touching a container (1000 disks).
The file format is as shown in Table 1 where we added descriptions of row and column. Fig. 2 shows an example of a data file which corresponds to Table 1  For RANDOM dataset, disks are stored in the generation order. For ON-BNDRY dataset, the rightmost disk touching a circular container appears first in a file and the other disks in counterclockwise (CCW) orientation follow. For MIXED dataset, the disks touching a container appears in a file like ON-BNDRY and then random disks follow. For ON-A-LINE dataset, the leftmost disk appears first and the others follow from left to right.

Experimental design, materials, and methods
We generate a benchmark dataset by a code written by Cþþ language in Window 10 operating system. The code's input parameters are data type (T), the number of disks (N), minimum radius (r min ), maximum radius (r max ) of disks, packing ratio (r), container touching ratio (g) and distance between the boundaries of two neighbour disks (d).

Disks in circular container
We fixed r, r min and r max as 0.1, 1.0 and 10.0, respectively. The code starts from generating a circular container C centered at the origin, whose radius R is calculated by the following (1) and (2). (2) Note. The generated set S is used only for calculating the radius of container but not for random disks.

1) RANDOM
Each disk is randomly generated in an axis-aligned bounding box of the container C (i.e. random location in the box and random radius between 1.0 and 10.0). If a generated disk is completely in the circular container C, it is stored in a list. If not, a new disk with same radius is generated and tested again. Note that random number generator is used for generating random location and radius with current time as a seed unless otherwise stated.

2) ON-BNDRY
Each disk having random radius is generated to touch the container C from inside. The contact points to the container C are regularly spaced. The first generated disk is the rightmost extreme disk and the others follow in CCW order.

3) MIXED
First, disks touching container boudnary vC are generated like ON-BNDRY. Then, disks not touching vC are generated like RANDOM. A random disk could touch vC. To prevent this case, we choose a random disk within a shrunken container where the shrunken factor is 0.99.
Note. We confirmed that difference between computed packing ratio r of each data file and the fixed value (0.1) is less than 10 À3 . The contact points of disks to a container are different in 10 À6 precision.

Disks in sausage configuration: ON-A-LINE
We set both r min and r max as 1.0. The first disk is generated to be centered at the origin and the other disks are generated to be centered on the x-axis where d is kept 0.5.
Note that all the data files is freely available from our dataset repository [3] which can be used by QuickhullDisk programs in the same repository and Voronoi Diagram Research Center, Hanyang University (http://voronoi.hanyang.ac.kr/).