Deep-Learning-Empowered 3D Reconstruction for Dehazed Images in IoT-Enhanced Smart Cities

: With increasingly more smart cameras deployed in infrastructure and commercial buildings, 3D reconstruction can quickly obtain cities’ information and improve the efficiency of government services. Images collected in outdoor hazy environments are prone to color distortion and low contrast; thus, the desired visual effect cannot be achieved and the difficulty of target detection is increased. Artificial intelligence (AI) solutions provide great help for dehazy images, which can automatically identify patterns or monitor the environment. Therefore, we propose a 3D reconstruction method of dehazed images for smart cities based on deep learning. First, we propose a fine transmission image deep convolutional regression network (FT-DCRN) dehazing algorithm that uses fine transmission image and atmospheric light value to compute dehazed image. The DCRN is used to obtain the coarse transmission image, which can not only expand the receptive field of the network but also retain the features to maintain the nonlinearity of the overall network. The fine transmission image is obtained by refining the coarse transmission image using a guided filter. The atmospheric light value is estimated according to the position and brightness of the pixels in the original hazy image. Second, we use the dehazed images generated by the FT-DCRN dehazing algorithm for 3D reconstruction. An advanced relaxed iterative fine matching based on the structure from motion (ARI-SFM) algorithm is proposed. The ARI-SFM algorithm, which obtains the fine matching corner pairs and reduces the number of iterations, establishes an accurate one-to-one matching corner relationship. The experimental results show that our FT-DCRN dehazing algorithm improves the accuracy compared to other representative algorithms. In addition, the ARI-SFM algorithm guarantees the precision and improves the efficiency.


Introduction
Artificial intelligence (AI) has recently become very popular, and a wide range of applications use this technique [1]. There are many smart systems based on deep learning for social services, such as smart cities and smart transportation. Smart cities can use AI technology such as machine learning, deep learning and computer vision to save money and improve the quality of life of residents [2]. Through using AI or machine learning technology to perform intelligent image processing, we can obtain important geographical region data. Such real-time data can be continuously monitored through AI technology, which would further develop cities' governance and planning [3].
In hazy environments, the reflected light of an object is attenuated before it reaches the camera or monitoring equipment, resulting in the degradation of the quality of an outdoor image [4]. Therefore, obtaining dehazed images in smart cities is an important problem to be solved by AI technology. In recent years, with the rapid development of AI technology, some dehazing algorithms based on deep learning have been proposed. Tang et al. [5] used the random forest algorithm to remove haze. Although the accuracy of the transmission image was improved, the texture features in the image were not used, which has certain limitations; and the effect of dehazing is not ideal. Cai et al. [6] adopted the convolutional neural network to learn the features of hazy image to estimate the transmission image. The convolutional neural network only uses a single scale for feature extraction, which makes it prone to color distortion, detail loss and excessive dehazing for many specific scenes. Li et al. [7] proposed AOD-Net based on a convolutional neural network to dehaze images. To avoid using additional methods to estimate atmospheric light via mathematical transformation, the network structure of this algorithm is relatively simple.
We propose a fine transmission image deep convolutional regression network (FT-DCRN) dehazing algorithm that uses fine transmission image [8] and atmospheric light value [9] to compute dehazed image. First, this paper proposes a deep convolutional regression network (DCRN) to obtain the coarse transmission image. The DCRN can not only expand the receptive field of the network, but it can also retain the features to maintain the nonlinearity of the overall network [10]. Second, the fine transmission image is obtained by refining the coarse transmission image using a guided filter [11]. The guided filter is used to optimize the coarse transmission image to improve the accuracy of dehazed images. Furthermore, the atmospheric light value is estimated according to the position and brightness of the pixels in the original hazy image. According to the obtained fine transmission image and atmospheric light value, the dehazed image is inverted using the atmospheric physical scattering model [12].
With increasingly more smart cameras deployed in infrastructure and commercial buildings, 3D reconstruction can quickly obtain information on cities and geographical regions [13]. It is important to solve the image matching problem using structure from motion (SFM) 3D reconstruction algorithms [14]. Feature detection and feature matching are subordinate image matching problems. Hossain et al. [15] proposed a CADT corner detection algorithm, which effectively reduces the positioning error and improves the average repeatability. Zhang et al. [16] proposed a Harris SIFT algorithm including illumination compensation, which not only improves the matching accuracy but also improves the real-time performance of the algorithm. However, the above image matching algorithms are not universal and cannot accurately extract image feature points under special lighting conditions. Zhou et al. [17] proposed a registration algorithm based on geometric invariance and local similar features, but it relies heavily on rough matching, and the correct matching points are eliminated. To solve the problems of the above algorithms, an advanced relaxed iterative fine matching based on the SFM (ARI-SFM) algorithm is proposed. The ARI-SFM algorithm, which obtains the fine matching corner pairs and reduces the number of iterations, establishes an accurate one-to-one corner matching relationship.
The contributions of this paper are listed as follows: (a) We use a deep learning algorithm for dehazed images in smart cities. The FT-DCRN dehazing algorithm is proposed, which uses fine transmission image and atmospheric light value to compute dehazed image. First, this paper proposes a DCRN algorithm to obtain the coarse transmission image. The DCRN can not only expand the receptive field of the network, but can also retain the features to maintain the nonlinearity of the overall network. Second, the fine transmission image is obtained by refining the coarse transmission image using a guided filter. The guided filter is used to optimize the coarse transmission image to improve the accuracy of dehazed images. (b) We perform 3D reconstruction using the dehazed images generated from the FT-DCRN algorithm. The ARI-SFM algorithm is proposed, which can obtain fine matching corner pairs and reduce the number of iterations. Compared with other representative algorithms, the ARI-SFM algorithm establishes an accurate one-to-one corner matching relationship, which guarantees the precision and improves the efficiency.

FT-DCRN Dehazing Algorithm
The purpose of a dehazing algorithm is to restore a sharp image from a blurred image caused by haze. Deep learning algorithms can provide great help for dehazy images [18], which can automatically identify patters or monitor the environment. In this paper, we propose a FT-DCRN dehazing algorithm to obtain useful images for 3D reconstruction. The steps of the FT-DCRN dehazing algorithm are as follows: Step 1: Obtain the coarse transmission image. Input the hazy images, and the coarse transmission image is obtained by using the DCRN dehazing algorithm.
Step 2: Compute the fine transmission image. The fine transmission image is obtained by refining the coarse transmission image using a guided filter.
Step 3: Estimate the atmospheric light value. The atmospheric light value [19] is estimated according to the position and brightness of the pixels in the original hazy image.
Step 4: Compute the dehazed image. According to the obtained fine transmission image and atmospheric light value, the dehazed image is inverted using the atmospheric physical scattering model.

DCRN Dehazing Algorithm
To obtain a coarse transmission image, this paper proposes a DCRN dehazing algorithm, and the overall network structure is shown in Fig. 1. The DCRN is an end-to-end network based on a convolutional neural network [20] that inputs hazy images and outputs corresponding coarse transmission images.
The DCRN is similar to the encoder-decoder network. The core unit of the encoder network is the convolutional unit (Conv), which is mainly composed of a convolutional layer [21], an ReLU, a pooling layer and a batch normalization (BN) layer [22]. The core unit of the decoder network is the deconvolutional unit (DeConv), which is mainly composed of a deconvolutional layer, a BN layer, an ReLU and a convolutional layer. The fully connected (FC) layer [23] is replaced by the convolutional layer. The features of the overall network are extracted by the encoder network. The decoder network is used to ensure the size of the output transmission image and retain the features to maintain the nonlinearity of the overall network [24]. Our DCRN can not only expand the receptive field of the network, but it can also ensure that the overall network has a certain nonlinear learning ability.
The main characteristic of an end-to-end network is that the input and output of the network are identical in size. However, due to the use of two pooling layers [25] in the encoder network, the feature set is smaller, and the original image information is lost.
To solve the problem of information loss from the original image, this paper uses a deconvolutional layer [26] to replace the upsampling layer, which can not only increase the size of the feature set, but can also produce a dense feature set with a larger spatial structure, the "Upconv4" is shown in the red box of Fig. 1. The Upconv4 contains the deconvolutional layer, convolutional layer and BN layer. The deconvolutional layer is often used in the densest mapping estimation problem. Furthermore, the cross-convolutional layer used in the DeConv unit can not only provide the DCRN with the multiscale feature learning ability, but can also avoid the vanishing gradient problem in the backpropagation process. Therefore, the DCRN can estimate the coarse transmission image more accurately.

Fine Transmission Image
In this paper, a guided filter is used to optimize the coarse transmission image to improve the accuracy of dehazed images [27]. The guided filter can be defined as: where q is the output image, which is the fine transmission image obtained after optimization. a k and b k are the coefficients of the window ω k .
where ε is the regularization parameter. A linear regression was used to obtain the following results: where u k and σ 2 k are the average and variance of image M in current window ω k , respectively. p k is the average value of p in window ω k .

Atmospheric Light Value
When estimating the atmospheric light value, He et al. [28] selected the pixels with the top one percent brightnesses in the hazy image and then calculated the average brightness of these pixels as the atmospheric light value. This method is more effective in most cases, but when a large white area appears in the image, the method will not accurately estimate the atmospheric light value, which will lead to image color distortion.
To solve the above problems, this paper uses the method of combining the pixel position and brightness to estimate the atmospheric light value. The relative height of each pixel is defined as H (x, y), and the brightness value is V (x, y). The probability of a pixel being located in the white area is defined as follows: The process determines the pixels with the probability value P (x, y) of being among the top one percent and uses the average brightness value of these pixels as the atmospheric light value.

ARI-SFM Algorithm
It is important to solve the image matching problem in the 3D SFM reconstruction algorithm. In the image matching process, the coarse matching relationship between corners is established by using the zero mean normalized cross-correlation method [29]. This method only builds a one-tomany set of matching corner pairs, so there are many unclear and incorrect matching pairs. We propose an ARI-SFM algorithm to guarantee the precision and improve the efficiency.

ARI Algorithm
To establish an accurate one-to-one corner matching relationship, an advanced relaxed iterative (ARI) algorithm is proposed, which obtains fine matching corner pairs and reduces the number of iterations. The flowchart of the ARI algorithm is shown in Fig. 2. The steps of the ARI algorithm are as follows: Step 1: Calculate the matching strength of coarse matching pairs. The matching strength is used as the indicator for fine matching corner selection [30].
Step 2: Judge the uniqueness of the corner pairs. The matching pairs are sorted according to the matching strength from large to small. We select the corner pairs S M (p 1i , q 2j ) and S M (p 1i , q 2j ) with the largest and second largest matching strengths, respectively. We calculate S P (p 1i , q 2j ) and use it to measure the uniqueness of the corner matching.
The value range of S P is 0∼1. According to S M (p 1i , q 2j ) and S P (p 1i , q 2j ), all matching pairs in the set are sorted. If all corners are in the top 60% of the two items, the corner pairs are accurate matching pairs. Then, proceed to Step 4; otherwise, proceed to Step 3.
Step 3: Delete the correct corner after fine matching, and return to Step 1.
Step 4: Output the fine matching corner pairs.

Calculation of Matching Strength
The initial matching corner pairs are represented as (p 1i , q 2j ), where p 1i is the corner of image I 1 and q 2j is the corner of image I 2 . N(p 1i ) and N(q 2j ) are neighborhoods with point p 1i and point q 2j as centers, respectively, and R as the radius. If (p 1k , q 2f ) is the correct matching pair, there must be more correct matching pairs (p 1k , q 2f ) in its neighborhoods N (p 1i ) and N q 2j , where p 1k ∈ N(p 1i ) and q 2f ∈ N(q 2j ) satisfy the conditions for calculating the matching strength.

Condition 2:
The angle between p 1i p 1k and q 2j q 2f is less than 90 degrees.
If the matching corner pairs (p 1i , q 2j ) and (p 1k , q 2f ) satisfy the above two conditions, the matching strength is calculated using formula (7): Similar ij and Similar kf are the gray cross-correlation values of the matching corner pairs (p 1i , q 2j ) and (p 1k , q 2f ), respectively. dist is the average distance of the corner pairs. The expression of δ is shown in formula (8): r is the relative distance deviation of the corner pair. The similarity contribution δ of (p 1k , q 2f ) to (p 1i , q 2j ) is a power function with a negative exponent and relative distance deviation r. δ is a monotonically decreasing function of r. When r is very large, the matching corner pair (p 1k , q 2f ) is ignored.

Experiments
In this paper, synthetic hazy images and real hazy images are used to train and test the performance of the FT-DCRN dehazing algorithm. First, we adopt the Make3D dataset [31] (http://make3d.cs.cornell.edu/data.html) to synthesize hazy images using the atmospheric scattering model. We selected 900 pairs of hazy and sharp images as training samples. Second, we take 1200 real hazy images of outdoor scenes, such as those of buildings, gardens, and parking areas, to analyze the results of the FT-DCRN dehazing algorithm. The FT-DCRN dehazing algorithm runs on a GeForce RTX 2080Ti GPU and executes using Python.
To verify the efficiency and accuracy of the ARI-SFM algorithm, the algorithm is implemented on an experimental platform with 64-bit Windows 10, an Intel(R) Core(TM) i5-10210U@1.60 GHZ CPU, and 8.00 GB of memory; the development platform is MATLAB R2018b.

Generation of Synthetic Hazy Images
Given a random value for the transmission image t (x, y) ∈ [0, 1] and the atmospheric light value A ∈ [200, 255], the synthetic hazy image I (x) is generated by formula (9). t (x, y)) (9) where J (x, y) represents the original sharp image. Fig. 3 shows the original sharp images, including images of a road, house, tree, and fountain, from the Make3D dataset. Fig. 4 shows the corresponding synthetic hazy image of the sharp images in the Make3D dataset.

Qualitative Evaluation (1) Results of Synthetic Hazy Images
To verify the effect of the FT-DCRN dehazing algorithm on the synthetic hazy images, the results of the algorithm are compared with some representative algorithms. Because different deep learning algorithms have their own advantages,we adopt the Tang's algorithm [5], Cai's algorithm [6] and Li's algorithm [7], which are described in the introduction.

(2) Results of Real Hazy Image
To verify the effect of the FT-DCRN dehazing algorithm on real hazy images, we analyze 1200 real hazy images of outdoor scenes, such as building, garden and parking area. We compare the results of the FT-DCRN with those of Tang's algorithm, Cai's algorithm and Li's algorithm. Fig. 9 show the comparison results of the dehazing of the images of building. Fig. 9 show that Tang's algorithm results in unclear boundaries for the buildings. Cai's algorithm easily produces color distortion, which makes the scene of the buildings look unreal. Li's algorithm changes the color of the white areas. Our approach has clear boundaries and textures, and the overall colors of the images are close to the normal visual effect.

Quantitative Evaluation
To perform the quantitative evaluation, synthetic hazy images and real hazy images are selected. We adopt the structural similarity (SSIM) [32], peak signal-to-noise ratio (PSNR) [33] and information entropy (IE) [34] to evaluate the effect of the FT-DCRN dehazing algorithm. The SSIM is an indicator of the similarity of two images. When two images are the same, the SSIM is equal to 1. The PSNR is a statistical indicator that is based on the gray values of image pixels. The higher the PSNR is, the better the image restoration. The IE is a statistical measure of features that reflects the average amount of information in the image. The larger the entropy is, the clearer the image. The experimental results for the synthetic hazy image are shown in Tabs. 1-4.

Results of 3D Reconstruction
Figs. 10a-10f includes six images of a building taken from different perspectives. The images of the building from different perspectives are taken from a real scene.
After using the ARI-SFM algorithm, the one-to-one relationship between corners is determined, and one-to-many relationship almost does not exist. In this experiment, we selected 6 images shown in Fig. 10 for 3D reconstruction. The final experimental results are shown in Fig. 11. Fig. 11 is the point cloud of 3D reconstruction of building. The Fig. 11 shows that the ARI-SFM algorithm can accurately reconstruct the 3D building, and the signs on the building are clearly visible.

Performance of 3D Reconstruction
To verify the matching efficiency of the ARI-SFM algorithm, the results of the algorithm are compared with some representative algorithms. Because different image matching algorithms have their own advantages, we adopt the Hossain's algorithm [15], Zhang's algorithm [16] and Zhou's algorithm [17], which are described in the introduction. We analyze the comparison of the matching results of the building, as shown in Tab. 5. Tab. 5 show that our approach have higher matching accuracy and cost less match time, which indicates that we guarantee the precision and improve the efficiency compared with other algorithms. Our approach determine the one-to-one relationship between corners and almost does not exist one-to-many relationship, which obtains fine matching corner pairs and reduces the number of iterations.

Conclusion
AI solutions can provide great help for dehazing images, which can automatically identify patterns or monitor the environment. Therefore, we propose a 3D reconstruction method for dehazed images for smart cities based on deep learning. First, we propose an FT-DCRN dehazing algorithm that uses fine transmission images and atmospheric light values to compute dehazed images. The DCRN is used to obtain the coarse transmission image, which can not only expand the receptive field of the network, but can also retain the features to maintain the nonlinearity of the overall network. The fine transmission image is obtained by refining the coarse transmission image using a guided filter. The atmospheric light value is estimated according to the position and brightness of the pixels in the original hazy image. Second, we use the dehazed images generated by the FT-DCRN dehazing algorithm for 3D reconstruction. The ARI-SFM algorithm, which obtains the fine matching corner pairs and reduces the number of iterations, establishes an accurate one-to-one matching corner relationship. The experimental results show that our FT-DCRN dehazing algorithm improves the accuracy compared to other representative algorithms. In addition, the ARI-SFM algorithm guarantees the precision and improves the efficiency.
Developing AI systems supporting smart cities requires considerable data. Through the acquisition of effective information, smart cities can truly become sustainable developments. By 2021, one billion smart cameras will be deployed in infrastructure and commercial buildings. The large amount of raw data collected is far beyond the scope that can be viewed, processed or analyzed manually. Through the machine learning training process, images can be analyzed for city planning and development. AI algorithms have become the developmental trend and key point of smart cities [35]; therefore, how to manage deep learning algorithms, data, software, hardware and services will become another problem in the future.