3D MODEL RECONSTRUCTION USING GAN AND 2.5D SKETCHES FROM 2D IMAGE

ABSTRACT


INTRODUCTION
Currently, in order to simulate the image of the object in a visual and vivid way, helping the observer to have a more complete detailed view of the object and to interact with the object to achieve high results, 3D modeling It is applied in many fields such as: medical imaging, creating scenes and building characters in cinema, designing in architecture, 3D printing, ... [1] [2]. In some fields, 3D shape reconstruction of objects has been successfully performed using specialized equipment that captures images of objects from different angles for 3D shape reconstruction. [3] [4] [5]. To reconstruct 3D shapes directly from a single image requires full knowledge of the specific 3D geometry of the object. This poses a challenge in the method-based approach because 3D object information is very diverse in real images, previous research directions have only focused entirely on synthetic data [5] [6]. ] [7], so it is often affected by the problem of data domain adaptation by imperfect finishing due to direct conversion from 2D to 3D, the reconstructed image has not reached the best efficiency. The 3D shape reconstruction methods have shown that if

Estimate 2.5D sketch from 2D image
This is the first component of the model (Figure 2a), which estimates the 2.5D outline of the object from a 2D RGB image. Inspired by the approach of MarrNet [13], this model is based on the architecture of ResNet-18 residual network [14]. Encoder using Resnet-18 with Conv1 layer modified kernel size from 7×7 to 3×3, stride is 2 and padding is 1 for the purpose of noise reduction and image smoothing when performing the integration. Convolution to encode from a 256×256 RGB 2D picture into 512 part guides of size 8×8. The decoder involves four deconvolution layers with a part of size 5×5, bounces of 2, and padding of 2. The result is a 2.5D sketch with surface, profundity, and veil data (Fig. 2b) and has a similar goal of 256×256.

Estimating 3D shapes from 2.5D. sketches
The second part of the model (Figure 2b) is an estimate of the 3D object shape from the 2.5D sketch estimated in the previous step ( Figure 2a). Since it takes only 3 information of surface, depth, and shadow as input, the model can be trained from synthetic datasets without the problem of non-adaptation because it is easy to render 2.5 sketches D rather than 2D realistic images. Inspired by the embedded TL network and the 3D-VEA-GAN network presented in Section 2, the 3D shape estimation model ( Figure 4) is an encoderdecoder network used to predict 3D shapes from 2.5D sketch. The encoder is also adapted from ResNet-18, performing convolution with a 3×3 kernel, stride of 2, padding of 1 to encode the 2.5D sketch into a 200-dimensional hidden vector. This vector then goes through a decoder consisting of 5 layers of 3D deconvolution with step and padding changed across

Fine-tune the accuracy of predicted 3D shapes
Due to the 3D shape obtained from the 2.5D sketch following the above step, the results will not be high. Therefore, we refine this shape using the 3D-GAN model [11]. In this way, the model will increase the accuracy of the final 3D shape. The idea is to build a discriminator that will check the 3D shape created from step 2.
The contrast between the proposed technique and the strategy for MarrNet: Both of the above approaches are inspired by the TL embedded network and the 3D-VEA-GAN network to rely on the 2.5D sketch as an intermediary. However, MarrNet only uses the neural network and the loss functions corresponding to the 3 information of the 2.5D sketch to edit the estimated 3D shape. Meanwhile, we use the GAN model to enhance the quality of the 3D shape. Compared with the conventional neural network, the GAN model will be better at tuning tasks. Therefore, by creating a discriminator in the GAN model, the proposed model can go from the 2D RGB image to the refined 3D shape estimated from this image. First, we utilize a pre-prepared 3D-GAN [11] organization to decide whether the 3D shape created in sync 2 of the model is sensible. Its example generator combines a 3D shape from a haphazardly drawn vector, and its discriminator separates created shapes from genuine shapes schematically in Figure 5. Subsequently, the discriminator. is equipped for displaying genuine shape dispersions and can be utilized as the misfortune capability of the model. The model generator does not participate in the final training of the model's shape. The pattern generator generates a 3D shape with input as a random vector going through 5 layers of 3D convolution with stride and variable padding across the layers, batch-norm matching. and ReLU along with the final sigmoid layer to create a 128×128×128 voxel shape. For the discriminator, use 5 layers of 3D convolution and leaky ReLU to distinguish the 3D shape generated from the model generator and the actual shape The 2.5D sketch estimation network was trained with the loss function . which is the sum of the 2 errors of the 3 surface, depth and mask information, utilizing the Stochastic Gradient Descent -SGD calculation with a learning rate is 1e-3, number of epoch iterations is 300, using optimization according to Adam's algorithm.

Jurnal Teknologi Informasi dan Pendidikan
For the 3D shape estimation network, we utilize the cross-entropy loss function to train the network at this stage and still using the SGD algorithm [15], the learning rate is 1e-3 with momentum is 0.9, the number of epoch iterations is 80 Finally, the 3D-GAN model to refine the 3D shape, because of the multidimensionality of the 3D shape (128×128×128), the preparation of the GAN becomes unsteady. To take care of this issue, we utilize the blunder GAN Wasserstein [16] [17].
where is the discriminator, and are the estimated 3D shape and the actual 3D shape, respectively. , the discriminator tries to minimize the loss function during the pattern generator try to maximize the loss function. From formula (1), we further define the precision error as là * +,-./ = − & ~0 1 enhances the performance of the model, where , is the complete reconstruction from this network. The 3D shape exactness tweaking network is prepared on the example generator G and the discriminator D, which utilizes Adam [18] enhancement with a learning pace of 1e-4 and a clump size of 4 for 80 ages. Network D is prepared on genuine example, faker example, and slope punishment as equation (1) The complete model is trained against the 3D shape estimation network and the D discriminator of the 3D shape refinement network. This model uses a loss function that is the sum of the loss function error of the 3D shape estimation model and the accuracy error as mentioned in the previous section: = + 2 * +,-./ . In the experiment, we choose 2 = 10 344 for the best results, this model also uses the SGD algorithm and runs in 80 epochs. The subtleties of the preparation are displayed in Algorithm 1.

Evaluation Metric
where 9, ; is the resulting 3D shape of the model and the actual shape of the object

ShapeNet Core55 dataset
ShapeNet Core55 dataset [19] provided by MIT institute, the total number of subjects in both training and testing dataset is 5,652 objects where seats are 1,816, car is 1,906,  Table 1.
When compared with other methods, our method for IoU is quite good for rectangular objects like chairs, cars. However, the method gives bad results for the aircraft object. From Table 2 we can see that the model gives good results with rectangular shapes such as chairs and cars. However, for non-rectangular shapes, the model still gives bad results. Other methods train on synthetic data and focus on isolated objects. When training data is synthetic and test is performed on real data, there is also a significant discrepancy of  Rendering a 3D image of an object using Blender Figure 6 shows that when * +,-./ is not used, the resulting 3D model is not as smooth as when * +,-./ is used. To visualize the created 3D model, we use Blender software to describe the generated 3D image as shown in Figure   5.

CONCLUSION AND DEVELOPMENT ORIENTATION
In this paper, we built a model to reproduce the 3D shape of the object through 2.5D sketch. With the addition of a transition from a 2D still image to a 2.5D sketch, the model is divided into several stages with a specific purpose. On account of that, the 3D state of the article is implicit in the most ideal way. Tests show that our technique works on the nature of the produced 3D shapes when contrasted and past strategies.