Elementary Methods for Generating Three-Dimensional Coordinate Estimation and Image Reconstruction from Series of Two-Dimensional Images

Department of Data Science, CHRIST (Deemed to Be University), Pune Lavasa Campus, Pune, India Department of Computer Science, CHRIST Deemed to Be University, Bangalore, India School of Engineering and Applied Sciences, Bennett University, Noida, India Department of Information Technology, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif 21944, Saudi Arabia Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif 21944, Saudi Arabia


Introduction
ree-dimensional image processing comprises visualization, processing, and analysis of three-dimensional image datasets. With the increase in computational power, image processing techniques have evolved from traditional twodimensional analysis to generation and analysis of spaces and objects in three dimensions. e abovementioned leap in mobile computation helped in the research on the threedimensional representation of common objects [1]. Applications such as virtual reality and augmented reality require the mapping of real-world objects into digital representations for flawless experience.

Descriptors and Representation of Objects.
Solid physical objects have definite shape and size. e size of an object could be seen as the outline of the dimensions defined by its length, width, and height [2]. e object in three dimensions, when captured, essentially creates a view of the object from the perspective of the capturing device (such as a camera).
is results in the generation of a two-dimensional Cartesian plane where each unit is represented by a pixel which is a combination of red, blue, and green. us, image capturing is a function that generates two-dimensional coordinates of a scene that gives the view from the capturing angle [3].
Consider a case of capturing a tangible convex object. e shape descriptors of the object might differ when it is seen from different angles or sides. Fundamentally, there are two types of objects with respect to this property: (1) Objects having uniform exterior shape (Type A) (2) Objects that do not have uniform shape (Type B) Here, the "shape" corresponds to the outline of the object that is described by the edges of the object. e shape of the first set of objects can be uniformly expressed by a uniform descriptor whereas the latter type requires description from multiple angles. For instance, footballs belong to the first category where the shape could be generally said as "round" or "spherical" whereas an object like a stapler could not be described by a universal shape. Even though the nature of shape remains uniform for the first category, the colour and texture might vary throughout the object [4,5]. e edges of an object belonging to the second category, when captured, cannot describe the shape of the overall object as the image does not give perceptions from other angles [6]. e object in three dimensions is thus represented without taking the "depth" of the object into account [7].

Applications.
ere are numerous applications where three-dimensional object models are used as primitive components. Augmented reality and virtual reality applications are the most commonly observed examples for the same. Off-site controlling and demo of technical work, tutorials for equipment or simulation of medical operations, guides to fit components, and so on [8,9] are some examples. Such generated three-dimensional models play an important role also in multimedia applications where the mock of objects is used. ese include video calls with objects present near to the caller, 3D viewer applications, and so on.
For any of these applications, it is very essential to have accurately created object models. Whether the objective is to create an immersive experience for the people or to create a composite view of objects, the rendering process requires prebuilt object models [10,11]. Often the users compare the models that are present to the actual real-world objects. erefore, a novel method for generating at least the preliminary structure of the objects is required.

Background
Given an object, a designer has to spend a considerable amount of time designing each component with respect to its depth, shape, and other characteristics in computer-aided design (CAD) software. e design process is tiring and requires skilled labour as each component with respect to its depth, shape, and other characteristics that have to be designed [12,13]. e use of three-dimensional models in applications is increasingly becoming popular. ere is a significant trade off between the quality of generated models and the manual effort applied in the process of modelling. No same mapping models could be applied universally to objects. e relative spatial arrangement of features must be taken care of at all levels of the object designing process. is study consists of suggesting methods to convert an object into its three-dimensional mapping. e suggested framework considers generic objects [14,15].

Existing Studies.
As mentioned earlier, there are various studies that discuss different algorithms for generating three-dimensional maps for stereoscopic images, SAR images, and so on. e recent digitalization insists on progressing beyond the limitations of traditional photo processing techniques.
ere are studies that compare existing reconstruction algorithms used in a variety of applications.
e survey focusing on triangulation and stereo-vision [16][17][18][19] compared the speed, accuracy, and practicality of different algorithms used in the motionparallax scenario.
is study also enumerates different approaches such as image-based, voxel-based, and objectbased approaches for scene and geometric parallax reconstruction [20,21]. Even though there was no mention of generation of object coordinates, the bounding box method was used on objects [22] for its pose estimation.
SV3DVision [23] is a depth-map generating algorithm used for reconstruction of scenes based on a single-photograph input. e above proposed method uses a singlephotograph input whereas a more calibrated method [24] using parallel axes uses a stereoscopic system. Both these studies focus on identification of near-placed and far-placed objects for depth-map generation, focusing on robotic vision applications [25]. A silhouette-based method [26,27] based on volume intersection approach is also available in the 3D model reconstruction research area. e above proposed study however uses camera calibration and bounding cube estimation for silhouette extraction using triangulation and decimation.
Apart from the algorithmic methods discussed in the previous studies, approaches based on neural networks [28,29] are also available. In these studies, depth estimation of the human body, face, videos, and so on is done. ese studies suggest adversarial methods to do reconstruction in single-and multi-view approaches [30,31].
For easing out the computer-aided design process, the methods should suggest reconstruction of objects with the fewest number of images. Even though the approach follows stereoscopic vision, the triangulation method still could be used for reconstruction of a scene. e features are mapped for triangulation [14,32]. However, this requires special camera setups; therefore, it is not suitable for everyone to follow. is problem is eliminated in the study [33] where stereo sequences captured using handheld cameras are considered for reconstruction. is method generates disparity maps and object boundaries in texture less regions [34]. Although this is the only study that discussed handheld cameras, this also was focusing on scene/space reconstruction whereas our requirement is to generate the same for objects [13].
e different techniques such as binocular disparity, motion parallax, image blur, linear perspective, triangulation, image blur method, and silhouette process have their own advantages and disadvantages [35]. We have to consider the shortcomings and advantages of each of these approaches to come up with a better solution.
We could observe that the space reconstruction and remodelling domain have a good improvement over the past years in terms of implemented methods. Research studies on parallax photography help in reconstructing dense threedimensional geometry of a space or a scene. However, with respect to the reconstruction of small objects based on different viewpoints, the progress is not that great [35]. However, studies related to object estimation and reconstruction are still not popular. It is important to note that none of the studies facilitates a faster computer-aided design process. We believe that this study will become an initiative in the field of three-dimensional object reconstruction as none of the studies have any indication towards that goal.

Problem Statement.
e computer-aided design makes a highly accurate representation of objects for visualization, dimensional analysis, and other applications. Objects are represented either in three-dimensional metric spaces or by vector edges. Based on the above fact, the research problem is to generate metric points in three dimensions based on an input set of images around an object. Figure 1 represents the overall system of our research.
is paper discusses an algorithm for generating preliminary object models that could be used for further processing. e study focuses only on replicating the external structure of a given object; the problem does not focus on replicating or predicting the internal structure of the object. A visual comparison of some input objects and generated object models is also given.

Methodology
It is observed that the generation of three-dimensional models requires highly sophisticated and costly capturing equipment. Our goal is to provide a framework for even common people where normal handheld cameras could be used. It is also important to discuss the scope of the goal. Ideally objects must be rotated over the top as well as the bottom. However, our scope does not focus on it. Consider the standard plane representation of Rubik's cube as given in Figure 2. With our method, we can construct the primary shape of the cube with the left, front, right, and back side (except the white and yellow faces), that is, reconstructing the shape surrounding the given object and not its top and bottom views.
Our problem statement conveys that we require a series of images for the input. ere are a good number of 3D object databases such as IKEA3D, LDOS, ObjectNet3D, and ingi10 K, and those are unusable in our case. Out of the available datasets, COIL-100 [11] is the most suitable one for our needs.

Dataset Description and Relevance.
We use the COIL-100 dataset for carrying out the study. COIL-100 was collected by the Center for research on intelligent systems at the department of Computer Science, Columbia University. is is an image-based dataset that consists of colour images of 100 different small objects taken at different angles. To be more precise, the entire 360°view is divided into 72 positions, 5°apart from each other, and the image is captured from that particular position. e sizes of given images are normalized for having homogeneity in terms of image properties. e background of each image is black, and the set of objects have a wide variety of complex geometric and reflectance characteristics. Even though this dataset was created to identify the angular pose of the image, we use the same set of images to attempt a 360 object reconstruction. Figure 3 shows the various classes that are present in the COIL-100 dataset. ere are 100 different objects that are part of the dataset. ese objects are used in our study for replication purposes. Figure 4 shows how each object is captured through different angles. For each object, all these images are taken and processed to generate models that are visually similar.

Testing and Evaluation Metrics.
e final design criteria are deciding on the approach for comparison of results with the input image.
e COIL-100 dataset provides various objects' images, captured at different angles. Hence, the modelled outputs could be visually compared with the input image dataset and result. ere are no defined mechanisms for the comparison of images and models. erefore, only the visual comparison of models and input image set is possible. In this paper, we are giving a step-by-step process of four randomly selected different objects from the COIL-100 dataset.

Proposed System
e literature review done in the earlier section showed us several approaches that are practised currently for the generation of three-dimensional maps for a pair or more images.
e proposed system is expressed in Figure 5. Since our primary objective is to provide paths (or vectors) at the end of the result, we cannot do volumetric reconstruction or voxel-based reconstruction. Our approach does not have a target model; hence, there is no scope for us to use any type of neural networks even though they are good at giving predictive results. Considering the above statements, an overall framework is suggested as shown in Figure 5.
Our problem cannot be categorized as a simple triangulation method or image binding method as we have a series of images and input. Hence, features present in each image play an important role in the reconstruction part.

Steps of Proposed System.
e steps that are part of the proposed system is explained in simple terms in the following subsections. e entire process is classified into two: estimation process and regeneration process. e algorithm discussed in the next section supports the following steps for the reconstruction process: Step 1. Preparing input image sequence: A series of 72 images is taken per object as input of the system. Any object part of the COIL-100 dataset would be input.
Step 2. Classification of images: ere are two types of images as far as we are concerned such as Type A and Type B as mentioned in the introduction. e approaches for both Type A and B are different; hence, we have to classify them to either of the types. e steps are to remove the background, extract edge descriptors, and find the variance of edge features. If the variance is above a threshold T, then the object is of Type B; otherwise A. If the object is of uniform shape (Type A), it implies that edges of the object will not be high.
Step 3. Texture removal: Objects in general have their own outer border shape, which is referred to as "outline" of the object throughout this paper. e proposed system is able to extract and reconstruct the same. e reconstruction process requires exact edges on its input. If any objects have some kind of textures, designs, or difference of contrast in its body, then it is important to remove those low-level features. Any variation in the texture could act as an object edge to the proposed regeneration algorithm. is could result in over fitting of the system. is step consists of combining edge detection and adaptive threshold processes. Once the small edges (low-level features) are identified, the adaptive threshold would help in filling the area with either of the binary colours. is enables the object image to be ready for defining its outer boundaries.
Step 4. Defining object boundaries: is step plays a key role in the entire process. For objects belonging to Type A, coordinate estimation process ends here as the average of all coordinate borders is smoothened and stored. For objects belonging to Type B, individual extreme border paths are created. e leftmost and rightmost extremes define the distance magnitude of a pair of points. When we use all images of a given object, then let n � 72. For i th image, the abovementioned excrement define mod(i − n/4, n) th and mod(i − n, n) th data points, respectively, as shown in Figure 6.
e generated values are stored as a new entry in a (3 x 72) matrix.
Step 5. Generating edge coordinates from object boundaries: Shape of the object is finalized in this step. We assume that edges generated are of good accuracy. Based on the extracted features in the previous step, we define the entire object boundaries and shape. We use regional edge linking process for the same, which assumes that the edge points are defined well and in an order. Since we have object boundaries of each image, the edge linking process is repeated over those images. We assume that the outline of a convex object will always be a closed polygon. For each image, the edges Step 6. Representing features of all images: As per our goal, we have to represent our coordinates mathematically in terms of vectors. e quality of images is suddenly increased after this step as the generated matrix is normalized, and values are smoothened and are converted in terms of paths.
Step 7. Combining features and converting edges to mathematical paths: is is the most essential step in the reconstruction process. e normalized input matrix is converted to image representations to find connectivity. e proposed algorithm is used for the same. is is repeated for all the images that will be able to generate a file openable in the CAD tool. e raster to vector conversion will help in resizing the new object to any extent.

Image Outputs at Different Stages.
e images show the effect of our processes in various steps. Four objects at four different angles are given below for easier comprehension. Using the features extracted from the above steps, our algorithm is implemented to generate the below result. e processed result is shown in Figure 8. e shapes in the given

Proposed Algorithm for Combining Images.
e input set of an array of images captured at different angles (5°apart) is the input for the below algorithm. A common feature is identified between pairs of images which are then used for the depth analysis.
Step 1. Let i and j denote any images in the set of images. e epipoles are computed from the 2D features of i th and j th images in the input set.
Step 2. Estimation of projection matrices using both images. e i th view image is used as zero-orientation image, and the projection matrix for the j th view is computed by the reference frame and epipole reconstruction along with the image feature matrix of i th image.
Step 3. Initial estimate of 3D point coordinates is obtained through the triangulation.
Step 4. For all the remaining pairs of views, the estimation is done by repeating steps 1 to 3. e obtained result will be unrefined 3D coordinate estimation for each pair of images that will have error between each element in the estimated result set.
Step 5. e obtained result set is optimized to refine coordinates by comparison. e reprojection error of each estimated three-dimensional point is reduced. As a result, we have better projective coordinates of the images.
Step 6. Projective coordinates are transformed to threedimensional metric coordinates by assuming a ground truth.
Step 7. Multiple 3D points are triangulated for estimation of three-dimensional structure.
We could see that the algorithm runs for each pair of images with O(n 2 ) complexity where n is the number of images for feature construction.

Observations
Our primary requirement was to generate models that are maximum similar to the input objects. Pixel matching based on mathematical computation of accuracy measures cannot be done because of the dissimilarity of input and output representations.
e system estimates stable posed 3D bounding boxes without additional 3D models.

Accuracy of the Proposed System
Accuracy of the output is dependent on the number of images used in the process. Since the COIL-100 dataset has images of very low dimensions, the variation in path was very high during the generation process. We suggest that higher image size will help in getting better accuracy. e objects of Type A have better visual accuracy than Type B images.

Variations with Parameters.
e only variable parameter is the number of images considered in the input array. Our observations show that the number of images is directly proportional to the accuracy of the final output. Table 1 shows a basic comparison of quality of output and number of images used (quality comparison based on image count). Graphical representation of quality comparison based on image count is represented in Figure 9.

Advantages.
e images given above directly show us the visual comparison of expected and obtained output. e observed advantages are bulleted as follows: e process of computer-aided design of objects is speeded up Vector paths are made instead of voxel or volumetric outputs   e texture or the volume of the objects is not present during the reconstruction process. It works for only convex-shaped objects and not spaces. Only a preliminary outline of the object is generated. For the object model to be used in actual real-world applications, the generated shapes must be further processed.

Future Scope.
We believe that this study is primitive and will lead to a new area of research as this is the first attempt to automate the three-dimensional object generation process. ere is still scope for implementing the same with another dataset as we had the same background for all images. Furthermore, generation of augmented views and coloured object outputs is still yet to be achieved. Another future enhancement could be the inclusion of top and bottom aligned images to make a completely rotatable model.

Conclusion
As part of the big leap in the image processing domain, volumetric estimation and reconstruction algorithms are getting popular. As a primitive attempt to automate the process of three-dimensional object reconstructions, we are able to suggest a framework in our study. Even though we were not able to make a complete triple-axis rotatable model output due to the unavailability of such images in the dataset, using a series of images, we were able to replicate the primitive shape and features of the input objects. e suggested system can act as replacement for manual designing processes at least at the initial stages. We are able to conclude that obtaining three-dimensional models is possible when a set of images around the object is given.

Data Availability
e data used to support the findings of this study are freely available at https://www.cs.columbia.edu/CAVE/software/ softlib/coil-100.php. Disclosure e funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Conflicts of Interest
e authors declare that there are no conflicts of interest.  Mathematical Problems in Engineering 9