Feature Detection of Focused Plenoptic Camera Based on Central Projection Stereo Focal Stack

Liu, Qingsong; Xie, Xiaofang; Zhang, Xuanzhe; Tian, Yu; Wang, Yan; Xu, Xiaojun

doi:10.3390/app10217632

Open AccessArticle

Feature Detection of Focused Plenoptic Camera Based on Central Projection Stereo Focal Stack

¹

Longshore Defense Troop Academy, Naval Aeronautical University, Yantai 264001, China

²

State Key Laboratory of Pulsed Power Laser Technology, College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, China

³

State Key Laboratory of High Performance Computing (HPCL), College of Computer, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2020, 10(21), 7632; https://doi.org/10.3390/app10217632

Submission received: 14 August 2020 / Revised: 20 October 2020 / Accepted: 26 October 2020 / Published: 29 October 2020

Download

Browse Figures

Versions Notes

Abstract

:

Fast and accurate feature extraction can lay a solid foundation for scene reconstruction and visual odometry. However, this has been rather a difficult problem for the focused plenoptic camera. In this paper, to the best of our knowledge, we first introduce an accurate and fast feature extraction algorithm based on central projection stereo focal stack (CPSFS). Specifically, we propose a refocusing algorithm that conforms to the central projection with regard to the center of main lens, which is more accurate than traditional one. On this basis, the feature is extracted on the CPSFS without calculating dense depth maps and total focus images. We verify the precision and efficiency of the proposed algorithm through simulated and real experiments, and give an example of scene reconstruction based on the proposed method. The experimental results show that our feature extraction algorithm is able to support the feature-based scene reconstruction via focused plenoptic camera.

Keywords:

focused plenoptic camera; feature detection; central projection; refocused image; CPSFS

1. Introduction

Plenoptic (or light field) cameras have become increasingly developed and widespread in recent years. In contrast with traditional pinhole cameras, they capture the light distribution both in spatial and angular dimension by the help of the inserted micro-lens array (MLA). According the MLA placement, plenoptic cameras can be classified into unfocused [1] and focused plenoptic cameras [2]. In an unfocused plenoptic camera, the main lens focuses the subject into the MLA (like Lytro). Furthermore, based on the position of MLA, focused plenoptic cameras can be classified into two types with the MLA behind (Keplerian) or in front of (Galilean) the focused main lens image. As a special case of focused plenoptic cameras, multi-focus plenoptic cameras use multiple types of micro-lenses (like Raytrix) to further extend the depth-of-field. Due to the better ability to restore depth information compared with unfocused plenoptic camera, the focused plenoptic camera is gaining more attention in fields like structure from motion (SFM) and visual odometry.

Robust and accurate feature detection methods are the foundation of many SFM algorithms [3,4,5]. At present, many works have been conducted with regard to feature extraction of the focused plenoptic camera. Bok et al. [6] propose a checkerboard line feature detector performed on the raw image, and Nousias et al. [7] present a checkerboard corner detector. Liu et al. [8] introduce an adaptive checkerboard corner detector, which can select the sharpest corners automatically. However, the methods mentioned [6,7,8] are designed for specific feature pattern, which is hard to use in SFM. Ferreira et al. [9] use the scale-invariant feature transform (SIFT) detector [10] to extract features and calculate depth on the raw image. However, the small size of micro-image makes the results less robust and accurate. Dansereau et al. [11] propose a LIFF feature detector for the unfocused plenoptic camera based on focal stack. However, the locations of detected features are restrict to the central sub-aperture image other than the whole raw image. Kühefuß et al. [12] carry out SURF feature detector directly on total focus image. However, the computation of dense depth image and total focus image is very time-consuming. Although it is easier to extract features from the total focus image, the traditional refocusing algorithm of focused plenoptic camera has some limitations. The traditional refocusing algorithm is based on the concept of virtual depth [2], which is widely used in the calculation of depth maps and total focus image [9,13,14,15,16,17]. However, the theoretical imaging range of the refocused image based on traditional method [2] is limited by the sensor size and the traditional focal stack do not conform to the central projection in theory. Besides, most methods [9,13,14,15,16,17] ignore the difference between the center of micro-lens and micro-image when calculating depth maps and total focus images, which will result in the deviation of calculated results from the theoretical model.

In order to extract features more accurately and efficiently, we perform an innovative feature extraction on the central projection stereo focal stack (CPSFS). Specifically, we present a refocusing model that conforms to the central projection with regard to the center of main lens. The calculated refocused image is strictly consistent with the theoretical model. Then, we propose the feature detection algorithm performed on CPSFS based on the stereo focal stack proposed by Hog et al. [17]. We analyze the relationship between the layer selection of CPSFS and the detection results through simulated experiments. Furthermore, the proposed feature detection method is tested on both simulated and real data. The experiment results indicate that the proposed detection method has good performance in speed, accuracy and noise condition. A scene reconstruction example is also conducted to demonstrate that the proposed method is able to support the scene reconstruction via focused plenoptic camera.

The main innovations of this paper are as follows.

(1) An accurate refocusing model which conforms to central projection with regard to the center of main lens.

(2) A fast and accurate feature extraction algorithm based on CPSFS, which can support an efficient feature-based SFM via focused plenoptic camera without calculating dense depth map and total focus image.

2. Refocusing Model Conforms to Central Projection

For the convenience of the following discussion, we first explain the coordinate systems and relevant symbols used in this paper. The camera coordinate system

O X Y Z

is established with the origin selected as the center of the main lens and the Z-axis set as the optical axis. The establishment of the raw image pixel coordinate system

o u v

and the refocused image pixel coordinate system

o^{'} s t

is shown in Figure 1. b represents the displacement from the main lens to the sensor and B stands for the displacement from the micro lens array (MLA) to the sensor. Note that both b and B are negative. Besides,

f_{L}

indicates the focal length of the main lens, which is positive.

2.1. Problems of the Traditional Refocusing Algorithm

For the traditional refocusing algorithm [2], given the virtual depth

v_{F}

of any refocusing point F, the projection point p on the raw image can be calculated through the following equation.

[\begin{matrix} 1 & 0 & (p_{u} - l_{u}) \\ 0 & 1 & (p_{v} - l_{v}) \end{matrix}] {[\begin{matrix} F_{u} & F_{v} & v_{F} \end{matrix}]}^{T} = [\begin{matrix} l_{u} \\ l_{v} \end{matrix}] .

(1)

In Equation (1),

(p_{u}, p_{v})

represents the pixel coordinates of p in the

o u v

coordinate system,

(l_{u}, l_{v})

illustrates the orthographic projection coordinates of micro-lens’ center related to p in

o u v

, and

(F_{u}, F_{v})

stands for the orthographic projection coordinates of F in

o u v

. By weighting the pixel values of all projection points, the pixel value of F in the refocused image can be obtained. However, due to the use of orthographic projection coordinates and virtual depth, the traditional refocusing model has the following problems.

2.1.1. Limited Theoretical Imaging Range

For any point F on a given refocus plane, the traditional method restricts

(F_{u}, F_{v})

within the orthographic range of the sensor in order to keep the size of refocused images remains the same. Therefore, the imaging range of the refocus plane is limited. For example, although the raw image contains multiple imaging point of

Q^{'}

, not all refocused images can image

Q^{'}

, as shown in Figure 2a.

2.1.2. Diverse Theoretical Imaging Positions of Same Object among Focal Stack

For a raw image with

N_{u} \times N_{v}

pixels, the size of the refocused image can be determined as

(d N_{u}, d N_{v})

with d representing the scale factor. Given an arbitrary imaging point Q with coordinates

(Q_{u}, Q_{v})

in

o u v

, let G denote the center of the imaging circle of Q in certain refocus plane, as shown in Figure 2a. The pixel coordinates of G in

o^{'} s t

is

(G_{s}, G_{t}) = d \frac{G_{z}}{Q_{z}} (Q_{u}, Q_{v}),

(2)

with

(G_{x}, G_{y}, G_{z})

and

(Q_{x}, Q_{y}, Q_{z})

standing for the coordinates of G and Q in

O X Y Z

, respectively. When d is fixed,

(G_{s}, G_{t})

changes with

G_{z}

. This makes the theoretical imaging positions of the same point differ in the focal stack, which makes the feature matching among traditional focal stack more difficult.

2.1.3. Deviation between Calculated Result and Theoretical Model

As the central coordinates of micro-lens cannot be obtained accurately before calibration, traditional refocusing method [2] use the central coordinates of micro-image instead during actual process, which makes the calculated refocused (total focus) image inconsistent with the theoretical model. As shown in Figure 2b, the correct projection point corresponding to F is p, while the actual point used is

p^{'}

. This will bring errors to the subsequent SFM algorithm during scene reconstruction.

2.2. Central Projection Refocusing Model Based on Plenoptic Disc Data

The plenoptic disc data is proposed by O’Brien et al. [18] in 2018, which is used to improve the calibration accuracy of the plenoptic camera. As shown in Figure 3a,

(F_{u}^{S}, F_{v}^{S})

represents the central projection coordinates of F in

o u v

with regard to the center of the main lens.

| R | r_{m i}

stands for the maximum pixel distance between

(F_{u}^{S}, F_{v}^{S})

and the center of micro-image which contains the projection point of F, with

r_{m i}

denoting the pixel radius of micro-image. The vector

(F_{u}^{S}, F_{v}^{S}, R_{F})

is called the plenoptic disc data of F, while

(F_{u}^{S}, F_{v}^{S})

and

R_{F}

are called the plenoptic disc center and plenoptic disc radius, respectively. The

R_{F}

can be calculated by

R_{F} = b (b - B - F_{z}) / (F_{z} B),

(3)

with

F_{z}

stands for the Z-coordinate of F in

O X Y Z

. The relation between

R_{F}

and

v_{F}

is

R_{F} = b v_{F} / (b - B - v_{F} B)

(4)

By the help of plenoptic disc data, the calculated focal stack will conform to the central projection with regard to the center of main lens, as shown in Figure 3b. Given the

(F_{u}^{S}, F_{v}^{S}, R_{F})

of F, the projected point

(p_{u}, p_{v})

on raw image can be obtained by

[\begin{matrix} 1 & 0 & (p_{u} - i_{u}) \\ 0 & 1 & (p_{v} - i_{v}) \end{matrix}] {[\begin{matrix} F_{u}^{S} & F_{v}^{S} & R_{F} \end{matrix}]}^{T} = [\begin{matrix} i_{u} \\ i_{v} \end{matrix}] .

(5)

In Equation (5),

(i_{u}, i_{v})

denotes the central coordinates of the micro-image with regard to

(p_{u}, p_{v})

in

o u v

. As illustrated in Figure 3b, the rendered refocused image based on Equation (5) can make full use of the light recorded by the sensor at any refocus plane. This can be done simply by restricting the central projection coordinates

(F_{u}^{S}, F_{v}^{S})

other than the orthographic coordinates

(F_{u}, F_{v})

within the range of raw image. Besides, given any imaging point Q, the pixel coordinates

(G_{s}, G_{t})

of G in

o^{'} s t

is

(G_{s}, G_{t}) = d (Q_{u}^{S}, Q_{v}^{S}),

(6)

with

(Q_{u}^{S}, Q_{v}^{S})

representing the central projection coordinates of Q in

o u v

. When d is fixed, the coordinates

(G_{s}, G_{t})

among different refocused images are strictly the same. What is more, the use of central coordinates of micro-images in Equation (5) makes the calculated results in accordance with the theoretical model without the help of camera calibration.

To sum up, the central projection refocused images have better characteristics than the traditional one, which can provide support for the accurate feature extraction.

2.3. Fast Central Projection Refocused Image Rendering Based on Micro-Image

In order to increase the rendering speed, we fill out the refocused image with the micro-images extracted from the raw image. Specifically, we adopt the ray-per-pixel approximation [19] and assume each pixel in the micro-image records the intensity of a single light ray. For any micro-image, let the coordinates of any non-central point in

o u v

be

(p_{u}, p_{v})

. Then, we can get Equation (7) according to the similar triangle theory.

(\begin{matrix} \frac{(p_{u} - i_{u}) s_{x}}{(F_{s}^{p} - F_{s}^{i}) s_{x}^{F}} & \frac{(p_{v} - i_{v}) s_{y}}{(F_{t}^{p} - F_{t}^{i}) s_{y}^{F}} \end{matrix}) = (\begin{matrix} \frac{B}{F_{z} - b + B} & \frac{B}{F_{z} - b + B} \end{matrix})

(7)

In Equation (7),

(s_{x}, s_{y})

and

(s_{x}^{F}, s_{y}^{F})

indicate the physical size of single pixel in the raw image and the refocused image, respectively, which contains F.

(F_{s}^{p}, F_{t}^{p})

and

(F_{s}^{i}, F_{t}^{i})

represent the pixel coordinates with regard to point

F^{p}

and

F^{i}

in

o^{'} s t

, respectively, as demonstrated in Figure 4. Based on the similar triangle theory, the

(s_{x}^{F}, s_{y}^{F})

can be obtained as

(s_{x}^{F}, s_{y}^{F}) = (F_{z} s_{x} / (d b), F_{z} s_{y} / (d b))

. Furthermore, we can obtain Equation (8) based on Equations (3) and (7).

(F_{s}^{p}, F_{t}^{p}) - (F_{s}^{i}, F_{t}^{i}) = - d R_{F} [(p_{u}, p_{v}) - (i_{u}, i_{v}))]

(8)

Intuitively, the light rays recorded by a single micro-image correspond to a circular area on the refocused image. The pixel radius of this circular area is

d | R_{F} |

times larger than that of the micro-image in the raw image. In this paper, we traverse all the micro-images and fill out the refocused image with the resized circle patch extracted from the micro-image. After the traversal is completed, the pixel value of the refocused image should be weighted accordingly.

3. Feature Detection Based on CPSFS

As multiple features on the raw images are associated with the same imaging point, the feature detection of the focused plenoptic camera is essentially the extraction of the imaging point, which contains the calculation of its coordinates (plenoptic disc data) and descriptor. It is easy to extract the plenoptic disc center and feature descriptor of the imaging point in the central projection focal stack, but this is not the case for plenoptic disc radius. Hog et al. [17] propose the stereo focal stack, which is formed by using only half of the micro-image during refocusing. The depth information can be obtained utilizing the parallax among the stereo focal stack. However, Hog et al. [17] still use the traditional refocusing model and ignore the difference between the center of micro-lens and micro-image, which means that the actual calculated results are not strictly consistent with the theoretical model. Considering this, we propose a fast and accurate feature detection algorithm based on CPSFS.

3.1. Central Projection Stereo Refocused Image Pair with Parallax

We can get two refocused images when utilizing half of the micro-images (left semicircle and right semicircle) based on the algorithm in Section 2.3. The obtained two refocused images are called the central projection stereo refocused image pair in this paper. The CPSFS is composed of multiple central projection stereo refocused image pairs from different refocus planes.

In the following part, we will analyze the relationship between the parallax of central projection stereo refocused image pair and the plenoptic disc radius. Assume that the main lens has a circular aperture. Using a semicircular micro-image during refocusing is equivalent to blocking half the main lens aperture. If Q is an imaging point of a point source of light, there will be two defocus semicircles in the left and right refocused images, as shown in the Figure 5a. The parallax between these two semicircles can be approximated by the parallax between the barycenters of two semicircles in the direction of s-axis from coordinates system

o^{'} s t

.

It can be seen from Figure 5a that the barycenters

C^{L}

and

C^{R}

of the two semicircles in different refocus planes always correspond to the same two points

p^{l}

and

p^{r}

in the raw images. If the micro-lenses are regarded as continuously distributed pinholes, then there will be corresponding micro-image center

i^{l}

and

i^{r}

with regard to

p^{l}

and

p^{r}

. According to Equation (8), we can calculate the parallax along s-axis as

ϵ_{Q} = d R_{F} (D_{u} - δ_{u}) + d D_{u},

(9)

with

ϵ_{Q} = C_{s}^{L} - C_{s}^{R}

,

D_{u} = i_{u}^{l} - i_{u}^{r}

, and

δ_{u} = p_{u}^{l} - p_{u}^{r}

.

(p_{u}^{l}, p_{v}^{l}), (p_{u}^{r}, p_{v}^{r}), (i_{u}^{l}, i_{v}^{l})

, and

(i_{u}^{r}, i_{v}^{r})

are the pixel coordinates of

p^{l}, p^{r}, i^{l}

, and

i^{r}

in

o u v

, respectively.

C_{s}^{L}

and

C_{s}^{R}

are the s-coordinates of

C^{L}

and

C^{R}

in

o^{'} s t

.

R_{F}

is the plenoptic disc radius of current refocus plane. Based on Equation (5), the plenoptic disc radius of Q can be calculated as

R_{Q} = D_{u} / (δ_{u} - D_{u})

. Thus, Equation (9) is turned into

ϵ_{Q} = - d D_{u} R_{F} / R_{Q} + d D_{u} .

(10)

As one can see, there is a linear relationship between

ϵ_{Q}

and

R_{F}

. According to the centroid formula of semicircle, we can get

D_{u} = 8 R_{Q} r_{m i} / (3 π)

. Furthermore, we can get

R_{Q} = R_{F} - 3 π ϵ_{Q} / (8 r_{m i} d) .

(11)

In this way, the plenoptic disc radius can be calculated using the parallax from central projection stereo refocused image pair. However, in practice, the calculated

R_{Q}

is error prone due to noise and mismatching. Therefore, a more robust way is to use CPSFS with two or more layers. The details are described in Section 3.2 and Section 3.3.

3.2. Calculation of Plenoptic Disc Data Based on Two-Layer CPSFS

Based on Equation (10), we can get the two parallax

ϵ_{Q}^{1}, ϵ_{Q}^{2}

of Q in two different refocus planes with plenoptic disc radius

R_{F}^{1}

and

R_{F}^{2}

. Furthermore, the

R_{Q}

can be obtained by

R_{Q} = (R_{F}^{2} ϵ_{Q}^{1} - R_{F}^{1} ϵ_{Q}^{2}) / (ϵ_{Q}^{1} - ϵ_{Q}^{2}) .

(12)

In practice, we use SIFT detector (not limited to SIFT) to extract features in the central projection stereo refocused image pair and perform feature matching along the s-axis. Let the detected features be

C^{L}, C^{R}, {C^{'}}^{L}

, and

{C^{'}}^{R}

in the two layers, as illustrated in Figure 5b. Thus, we have

ϵ_{Q}^{1} = C_{s}^{L} - C_{s}^{R}

and

ϵ_{Q}^{2} = {C^{'}}_{s}^{L} - {C^{'}}_{s}^{R}

. Due to error detection and mismatching, we should limit the

ϵ_{Q}^{1}

and

ϵ_{Q}^{2}

within the valid range. Therefore, we restrict

| R_{Q} - R_{F} | < θ_{R}

and have

| ϵ_{Q} | < 8 d r_{m i} θ_{R} / (3 π)

based on Equation (11).

Then, the matched feature pairs

C^{L}, C^{R}

and

{C^{'}}^{L}, {C^{'}}^{R}

from different layers in CPSFS should be matched. This is rather simple for CPSFS. As illustrated in Figure 5b, point G and

G^{'}

are located in the center of relevant matched feature pairs

C^{L}, C^{R}

and

{C^{'}}^{L}, {C^{'}}^{R}

. According to Equation (6), we can have

(Q_{u}^{S}, Q_{v}^{S}) = \frac{1}{d} (\frac{C_{s}^{L} + C_{s}^{R}}{2}, \frac{C_{t}^{L} + C_{t}^{R}}{2}) = \frac{1}{d} (\frac{{C^{'}}_{s}^{L} + {C^{'}}_{s}^{R}}{2}, \frac{{C^{'}}_{t}^{L} + {C^{'}}_{t}^{R}}{2}) .

(13)

Therefore, given a fixed d, the plenoptic disc data of SIFT feature point Q can be calculated by Equations (12) and (13). The descriptor of Q is the average value of the descriptors of

C^{L}, C^{R}, {C^{'}}^{L}

, and

{C^{'}}^{R}

. In order to ensure the number and accuracy of the detected features, the two layer of the CPSFS should be relatively close to the actual imaging point Q.

3.3. Calculation of Plenoptic Disc Data Based on Multi-Layer CPSFS

In practice, there is usually a depth range for real scene. As detection methods based on two-layer CPSFS cannot effectively detect the features across the scene, we solve this problem by using CPSFS with multiple layers.

Specifically, there is N layers of refocus planes with plenoptic disc radius

R_{m} = R_{0} + m Δ R

,

m = 1

,

\dots, N (N > 2, Δ R > 0)

. For each two layers out of N layers of refocus planes, the plenoptic disc data of SIFT features are calculated using the algorithm in Section 3.2. In order to avoid repeated detection, the detected features with close plenoptic disc center and similar descriptors are regarded as the same. Therefore, we calculate the average plenoptic disc data and descriptors of all repeated features.

The choice of parameters

R_{0}, N

, and

Δ R

directly affects the detected results. Generally speaking, when

Δ R

is about 1–2, the algorithm can get good results and speed. The choice of

R_{0}

and N should rely on the working distance of camera. Empirically speaking,

R_{0}

is usually about 3–5 and N is about 5–8. The detailed simulation is shown in Section 4.1.2 and Section 4.1.3.

4. Experiments

In this part, simulated and real experiments are carried out to demonstrate the precision and efficiency of the proposed method. The experimental results are obtained on Windows 7 operating system with Inter Core i7-7700 CPU (3.6 GHz). All the codes are available online https://github.com/samliu0631/Feature-Detection-using-CPSFS.

4.1. Simulated Experiments

We simulate the raw images of multi-focused pleonptic camera based on forward ray tracing [20]. On this basis, the proposed central projection refocusing algorithm is compared with the traditional one [2] using simulated data. Besides, the performance of the proposed feature detection method with regard to accuracy, speed, and anti-noise capacity is tested on simulated raw images.

4.1.1. Comparison of Refocusing Algorithm

The simulated parameters of the multi-focused pleonptic camera are shown in Table 1. As all parameters are known in advance, including the positions of micro-lenses, the refocused images can be generated strictly in accordance with the traditional refocusing algorithm [2] and ours.

In order to display the results more clearly, the checkerboard plane is used as the imaging target, which is set at the position 0.9m away from the main lens and perpendicular to the optical axis. The simulated raw images are shown in the Figure 6a,b. During the experiment, the traditional refocusing algorithm [2] is used to render the refocused image at planes with virtual depth

v_{F} \in {- 6, - 12}

, as illustrated in Figure 6c,d. According to Equation (4), the corresponding value of plenoptic disc radius is

R_{F} \in {- 5.64, - 10.54}

. The rendered images based on our central projection refocusing algorithm are shown in the Figure 6e,f.

It can be seen that the imaging range of traditional refocused image becomes smaller with the increase of

| v_{F} |

, while that of the proposed method maintain the same regardless of the change of

R_{F}

. What is more, this results in the fact that the coordinates of the imaging center corresponding to the same object point in our refocused image remain the same in spite of the position of the refocus plane, while that is not the case for traditional method. As one can see, the experimental results are consistent with the analysis in Section 2.

During the actual process of calculating refocused images or total focus images, most methods [9,13,14,15,16,17] use the center of micro-image to approximate the center of micro-lens. Essentially, this is equivalent to refocusing based on Equation (5) other than Equation (1). If the concepts of virtual depth and orthographic coordinates are still used during reconstruction without further rectification, it will bring errors to the final reconstructed result. To verify this, the Equation (5) is used to calculate the total focus image (

R_{F} = - 6.55

) corresponding to the simulated raw image. Then, the checkerboard corners on the total focus image are converted to real space based on the concept of traditional model and ours. The reconstruction results are shown in the Figure 7. It is obvious that the results of traditional methods deviate from the ground truth while ours are not.

The experimental results demonstrate that the central projection refocusing model in this paper is more consistent with the actual results.

4.1.2. Relation between Layer Distance in CPSFS and Detected Results

The purpose of the experiment in this part is to verify the relationship between layer distance selected in CPSFS and feature detection results. To simplify the problem, the detection method with two-layer CPSFS is used during the experiment. The simulated parameters of the camera are shown in Table 1 and we use the textured pictures to simulate the plane targets. The evaluated indicators include detection speed, and the number and depth accuracy of detection points.

During the experiment, the planar target is placed at a fixed position away from the camera and perpendicular to the optical axis, which will form an imaging plane at the position with plenoptic disc radius

R_{g t}

. Two refocus planes in the CPSFS are selected at

R_{g t} + Δ R / 2

and

R_{g t} - Δ R / 2

respectively. We change the value of

Δ R

within the range

{Δ R | Δ R = 0.5 i, i = 1, \dots, 12}

and observe the change of the detection results.

For generality of the results, we carry out 20 independent feature detection experiments on 20 different simulated raw images for each

Δ R

. The evaluation indexes include the total number of detected features, the mean errors of the detected features’ plenoptic disc radii, the root mean square errors (RMSEs) of the detected features’ plenoptic disc radii, and the detection time. Besides, we repeat the above experiment 3 times with the

R_{g t}

set to

- 4, - 8

, and

- 12

, respectively. The other parameters used are set as

d = 1 / 2, θ_{R} = 5

. The results are presented in Figure 8.

It can be seen that the trends of three curves in Figure 8 are similar, which reflects the general relationship between layer distance of CPSFS and detected results. When the positions of two refocus planes are close to the

R_{g t}

(

Δ R

is smaller), both the number of detected feature points and the RMSE of the detected features’ plenoptic disc radius are large. This is because the closer the refocusing planes are to

R_{g t}

, the clearer the refocusing image, which makes it easier to detect more feature points. However, at the same time, the parallax among the central projection stereo refocused image pair is relatively small, which leads to the increase of calculation error in plenoptic disc radius. As can be seen from Figure 8b, the mean error is close to 0, which indicates that the proposed algorithm is unbiased. Figure 8d illustrates that the proposed algorithm is time efficient and the mean detection time of each frame is about 20 s. The example of detection result is illustrated in Figure 9.

Based on analysis above, we can conclude that it is necessary to keep a close layer distance in CPSFS, in order to ensure the number and accuracy of detected features. Empirically speaking, the

Δ R

can be chosen within

[1, 2]

.

4.1.3. Performance of Detection Algorithm Based on Multi-Layer CPSFS

In this part, we carry out feature detection using L-layer CPSFS on simulated raw images, in order to test the performance of the proposed method in scenes with different depth ranges. Specifically, different plane targets are placed perpendicular to the optical axis successively, so that the targets will form multiple imaging planes with plenoptic disc radius range

{R_{g t} | R_{g t} = - 2 - i, i = 1, 2 \dots, 8}

. We carry out feature detection with L-layer CPSFS and observe the change of detection results with regard to the change of

R_{g t}

.

For each value of

R_{g t}

, 20 independent feature detection experiments are conducted on 20 simulated raw images with different textures. In order to reflect the relation between the number of layer in CPSFS and the detected results, we change the value of L and repeat the experiment 3 times. The specific value of L is chosen from

{2, 4, 8}

and the corresponding plenoptic disc radii of the refocus planes used in CPSFS are

{- 3, - 4}

,

{- 3, - 4, - 5, - 6}

, and

{- 3, - 4, - 5, - 6, - 7, - 8, - 9, - 10}

, respectively. Note that

d = 1 / 2, θ_{R} = 5

, during the experiment.

The detected results are shown in Figure 10. As one can see, the more layers used in CPSFS, the better the detection results, but the calculation time will increase as well. In practice, we can first determine the approximate range of the plenoptic disc radius according to the working range of the focused plenoptic camera. Then, the CPSFS can be layered accordingly. The detected results of the same raw image using 2-layer CPSFS and 8-layer CPSFS are illustrated in Figure 11.

At this point, we can conclude that the proposed feature detection algorithm based on multi-layer CPSFS is precise, effective, and time-efficient. In the actual process, the calculation efficiency can be further improved by reasonably layering of CPSFS.

4.1.4. Anti-Noise Capacity Test

In this part, the anti-noise capacity of the proposed method is tested on simulated raw images with noise. During the experiment, the plane object is placed at the fixed position to generate the imaging plane at

R_{g t} = - 6

and the CPSFS used has eight layers with

d = 1 / 4

and

θ_{R} = 5

. For generality of the results, we simulate 50 raw images with 50 different pictures. During the simulation, Gaussian noise with zero mean and a standard deviation

σ

is added to the simulated raw images. The detected results with regard to different

σ

(noise level) are shown in Figure 12.

Although the number of detection points decreases with the increase of noise level, the detection results can still maintain a high accuracy. In fact, the refocusing process is essentially equivalent to a weighted average of the corresponding pixel in the raw image, which will effectively improve the signal-to-noise ratio of the refocused image. Therefore, the proposed method based on CPSFS has good anti-noise ability. The specific detection example is presented in Figure 13. It can be seen that the raw image is in poor condition under the pollution of noise, but the proposed algorithm can still detect the SIFT [10] features in the raw image with good accuracy.

4.2. Real Experiments

In order to further verify the effectiveness of the proposed algorithm, we use Raytrix’s R29 camera to perform feature detection experiments based on real captured raw images. Besides, we incrementally reconstruct the detected features of 11 captured raw images to prove that the detected results of the proposed method can be used as input for SFM algorithm in scene reconstruction.

4.2.1. Feature Detection on Real Data

In this part, 5-layer CPSFS is used and the corresponding plenoptic disc radius is

R_{F} \in {- 4, - 5.5, - 7, - 8.5, - 10}

. The parameters used in the algorithm are set as

d = 1 / 4, θ_{R} = 5

. The example of generated central projection stereo refocused image pair is shown in Figure 14a,b, and the feature matching result of the central projection stereo refocused image pair is demonstrated in Figure 14c. In this paper, a total of 11 real images are tested and the total number of detected features is 7985. The average calculation time of each frame (6576 pixel × 4384 pixel) is 86.3 s without any acceleration. The specific detection example is indicated in Figure 14d.

4.2.2. Example of Scene Reconstruction

During the experiment of scene reconstruction, we use our previous work [8] to calibrate the R29 camera (https://github.com/samliu0631/Stepwise-Calibration-for-plenoptic-camera). The relative poses of the initial two frames are estimated using the method proposed by Li et al. [21] and the absolute poses of the remaining nine frames are calculated according to method proposed by Kneip et al. [22]. The sparse point clouds are incrementally reconstructed using method similar to COLMAP [3]. The example of captured raw image and the final reconstruction result are shown in the Figure 15. Compared with the traditional feature extraction method using raw images or total focus images, the proposed method can improve the detection efficiency while ensuring the detection accuracy, thereby improving the efficiency of scene reconstruction.

5. Conclusions

In this paper, we propose a fast and accurate feature detection algorithm suitable for focused plenoptic cameras that can provide a reference for scene reconstruction. First, we propose a refocusing algorithm based on the concept of plenoptic disc data. The generated focal stack is consistent with the central projection with regard to the center of main lens. On this basis, we propose a feature detection algorithm using multi-layer CPSFS. The proposed method can improve the detection efficiency while ensuring the detection accuracy. Both simulated and real experiments are carried out to prove the effectiveness of our method. Besides, a specific example of scene reconstruction is conducted. The experimental results demonstrate that the feature detection method in this paper can provide a good reference for the feature-based scene reconstruction via focused plenoptic camera.

Author Contributions

All authors have contributed in some way to the concept and implementation of this paper. All authors contributed to the paper either during the writing or editing phases. All authors have read and agreed to the published version of the manuscript.

Funding

Research Grants from College of Advanced Interdisciplinary Studies, National University of Defense Technology (JC18-07).

Conflicts of Interest

The authors declare no conflict of interest.

References

Ng, R.; Levoy, M.; Brédif, M.; Duval, G.; Horowitz, M.; Hanrahan, P. Light field photography with a hand-held plenoptic camera. Comput. Sci. Tech. Rep. CSTR 2005, 2, 1–11. [Google Scholar]
Perwass, C.; Wietzke, L. Single lens 3D-camera with extended depth-of-field. In Proceedings of the Human Vision and Electronic Imaging XVII, International Society for Optics and Photonics, Burlingame, CA, USA, 23–26 January 2012; Volume 8291, p. 829108. [Google Scholar]
Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Nousias, S.; Lourakis, M.; Bergeles, C. Large-Scale, Metric Structure From Motion for Unordered Light Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3292–3301. [Google Scholar]
Johannsen, O.; Sulc, A.; Goldluecke, B. On Linear Structure from Motion for Light Field Cameras. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 720–728. [Google Scholar]
Bok, Y.; Jeon, H.G.; Kweon, I.S. Geometric calibration of micro-lens-based light field cameras using line features. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 287–300. [Google Scholar] [CrossRef]
Nousias, S.; Chadebecq, F.; Pichat, J.; Keane, P.; Ourselin, S.; Bergeles, C. Corner-based geometric calibration of multi-focus plenoptic cameras. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 957–965. [Google Scholar]
Liu, Q.; Xie, X.; Zhang, X.; Tian, Y.; Li, J.; Wang, Y.; Xu, X. Stepwise calibration of plenoptic cameras based on corner features of raw images. Appl. Opt. 2020, 59, 4209–4219. [Google Scholar] [CrossRef]
Ferreira, R.; Goncalves, N. Fast and accurate micro lenses depth maps for multi-focus light field cameras. In German Conference on Pattern Recognition; Springer: Cham, Switzerland, 2016; pp. 309–319. [Google Scholar]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Dansereau, D.G.; Girod, B.; Wetzstein, G. LiFF: Light field features in scale and depth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 8042–8051. [Google Scholar]
Kühefuß, A.; Zeller, N.; Quint, F.; Stilla, U. Feature Based RGB-D SLAM for a Plenoptic Camera. BW-CAR| SINCOM 2016, 25, 25–29. [Google Scholar]
Palmieri, L.; Koch, R. Optimizing the Lens Selection Process for Multi-focus Plenoptic Cameras and Numerical Evaluation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1763–1774. [Google Scholar]
Palmieri, L.; Koch, R.; Veld, R.O.H. The Plenoptic 2.0 Toolbox: Benchmarking of Depth Estimation Methods for MLA-Based Focused Plenoptic Cameras. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 649–653. [Google Scholar]
Zeller, N.; Quint, F.; Stilla, U. Establishing a probabilistic depth map from focused plenoptic cameras. In Proceedings of the 2015 International Conference on 3D Vision (3DV), Lyon, France, 19–22 October 2015; pp. 91–99. [Google Scholar]
Zeller, N.; Quint, F.; Stilla, U. Filtering probabilistic depth maps received from a focused plenoptic camera. BW-CAR| SINCOM 2015, 2, 7–12. [Google Scholar]
Hog, M.; Sabater, N.; Vandame, B.; Drazic, V. An image rendering pipeline for focused plenoptic cameras. IEEE Trans. Comput. Imaging 2017, 3, 811–821. [Google Scholar] [CrossRef] [Green Version]
O’Brien, S.; Trumpf, J.; Ila, V.; Mahony, R. Calibrating Light-Field Cameras Using Plenoptic Disc Features. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 286–294. [Google Scholar]
Grossberg, M.D.; Nayar, S.K. The raxel imaging model and ray-based calibration. Int. J. Comput. Vis. 2005, 61, 119–137. [Google Scholar] [CrossRef]
Zhang, R.; Liu, P.; Liu, D.; Su, G. Reconstruction of refocusing and all-in-focus images based on forward simulation model of plenoptic camera. Opt. Commun. 2015, 357, 1–6. [Google Scholar] [CrossRef]
Li, H.; Hartley, R.; Kim, J.H. A linear approach to motion estimation using generalized camera models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
Kneip, L.; Furgale, P.; Siegwart, R. Using multi-camera systems in robotics: Efficient solutions to the npnp problem. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 3770–3776. [Google Scholar]

Figure 1. Illustration of coordinate system.

Figure 2. Illustration of traditional refocusing algorithm. (a) Traditional refocusing model. (b) Illustration of projection error.

Figure 3. Illustration of proposed refocusing algorithm. (a) Illustration of plenoptic disc data. (b) Illustration of central projection refocusing model.

Figure 4. Illustration of relation between micro-image and refocused image.

Figure 5. Illustration of parallax in central projection stereo focal stack (CPSFS). (a) Central projection stereo refocused image pair. (b) Parallax of Q in CPSFS.

Figure 6. Simulated results of refocusing experiment. (a) Simulated raw image with vignetting. (b) Simulated raw image without vignetting. (c) Traditional refocused image at

v_{F}

= −6. (d) Traditional refocused image at

v_{F}

= −12. (e) Our refocused image at

R_{F}

= −5.64. (f) Our refocused image at

R_{F}

= −10.54.

Figure 6. Simulated results of refocusing experiment. (a) Simulated raw image with vignetting. (b) Simulated raw image without vignetting. (c) Traditional refocused image at

v_{F}

= −6. (d) Traditional refocused image at

v_{F}

= −12. (e) Our refocused image at

R_{F}

= −5.64. (f) Our refocused image at

R_{F}

= −10.54.

Figure 7. Reconstructed results of simulated experiment. (a) Side view of reconstructed results. (b) Top view of reconstructed results. R represents results using plenoptic disc radius. V indicates the results utilizing concept of virtual depth.

G T

stands for ground truth.

Figure 7. Reconstructed results of simulated experiment. (a) Side view of reconstructed results. (b) Top view of reconstructed results. R represents results using plenoptic disc radius. V indicates the results utilizing concept of virtual depth.

G T

stands for ground truth.

Figure 8. Detected results of proposed algorithm using 2-layer central projection stereo focal stack (CPSFS) on simulated raw images. (a) The number of the detected features. (b) Mean error of detected features’ plenoptic disc radius (dimensionless). (c) Root mean square error (RMSE) of detected features’ plenoptic disc radius (dimensionless). (d) Total detection time of 20 independent experiments.

Figure 9. Example of detected results on simulated raw image using proposed algorithm based on 2-layer CPSFS with

R_{g t}

= −4,

Δ R

= 2. Note that different colors in figure are used to distinguish the features projected by diverse plenoptic disc data.

Figure 9. Example of detected results on simulated raw image using proposed algorithm based on 2-layer CPSFS with

R_{g t}

= −4,

Δ R

= 2. Note that different colors in figure are used to distinguish the features projected by diverse plenoptic disc data.

Figure 10. Detected results of proposed algorithm using multi-layer CPSFS on simulated raw images. (a) The number of detected features with regard to

R_{g t}

. (b) Mean error of detected features’ plenoptic disc radius (dimensionless). (c) RMSE of detected features’ plenoptic disc radius (dimensionless). (d) Total detection time of 20 independent experiments.

Figure 10. Detected results of proposed algorithm using multi-layer CPSFS on simulated raw images. (a) The number of detected features with regard to

R_{g t}

. (b) Mean error of detected features’ plenoptic disc radius (dimensionless). (c) RMSE of detected features’ plenoptic disc radius (dimensionless). (d) Total detection time of 20 independent experiments.

Figure 11. Examples of detected results on simulated raw images. (a) Error detected result using 2-layer CPSFS. (b) Detected result using 8-layer CPSFS. The results in panels (a,b) are based on the same raw image. Note that different colors in figures are used to distinguish the features projected by diverse plenoptic disc data.

Figure 12. Detected results of proposed algorithm using 8-layer CPSFS on simulated raw images with noise. (a) Detected feature number during noise test. (b) Mean error of detected features’ plenoptic disc radius during noise test (dimensionless). (c) RMSE of detected features’ plenoptic disc radius during noise test (dimensionless). (d) Total detection time of 50 independent experiments during noise test.

Figure 13. Detection result with noise level

σ

= 0.3 pixel. The size of simulated raw image is 3000 pixel × 2000 pixel. Note that different colors in figure are used to distinguish the features projected by diverse plenoptic disc data.

Figure 13. Detection result with noise level

σ

= 0.3 pixel. The size of simulated raw image is 3000 pixel × 2000 pixel. Note that different colors in figure are used to distinguish the features projected by diverse plenoptic disc data.

Figure 14. Example of feature detection on real data. (a) Left refocused image. (b) Right refocused image. (c) Feature matching result of central projection stereo refocused image pair. (d) Illustration of detected features on raw image.

Figure 15. Sparse point cloud reconstruction results based on real data. (a) Example of captured raw image (6576 pixel × 4384 pixel). (b) Reconstructed sparse point cloud from 11 frames.

Table 1. Simulated parameters of multi-focus plenoptic camera.

(s_{x}, s_{y})

stands for the physical size of a pixel in raw image.

f_{m 1}, f_{m 2}

, and

f_{m 3}

represent the focal lengths of three types of micro-lenses.

Table 1. Simulated parameters of multi-focus plenoptic camera.

(s_{x}, s_{y})

stands for the physical size of a pixel in raw image.

f_{m 1}, f_{m 2}

, and

f_{m 3}

represent the focal lengths of three types of micro-lenses.

$f_{L}$ (mm)	b (mm)	B (mm)	$s_{x} (μ m)$	$s_{y} (μ m)$	$f_{m 1} (mm)$	$f_{m 2} (mm)$	$f_{m 3} (mm)$
100	−104.5	−1.32	5.5	5.5	1.62	1.92	2.35

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Q.; Xie, X.; Zhang, X.; Tian, Y.; Wang, Y.; Xu, X. Feature Detection of Focused Plenoptic Camera Based on Central Projection Stereo Focal Stack. Appl. Sci. 2020, 10, 7632. https://doi.org/10.3390/app10217632

AMA Style

Liu Q, Xie X, Zhang X, Tian Y, Wang Y, Xu X. Feature Detection of Focused Plenoptic Camera Based on Central Projection Stereo Focal Stack. Applied Sciences. 2020; 10(21):7632. https://doi.org/10.3390/app10217632

Chicago/Turabian Style

Liu, Qingsong, Xiaofang Xie, Xuanzhe Zhang, Yu Tian, Yan Wang, and Xiaojun Xu. 2020. "Feature Detection of Focused Plenoptic Camera Based on Central Projection Stereo Focal Stack" Applied Sciences 10, no. 21: 7632. https://doi.org/10.3390/app10217632

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Detection of Focused Plenoptic Camera Based on Central Projection Stereo Focal Stack

Abstract

1. Introduction

2. Refocusing Model Conforms to Central Projection

2.1. Problems of the Traditional Refocusing Algorithm

2.1.1. Limited Theoretical Imaging Range

2.1.2. Diverse Theoretical Imaging Positions of Same Object among Focal Stack

2.1.3. Deviation between Calculated Result and Theoretical Model

2.2. Central Projection Refocusing Model Based on Plenoptic Disc Data

2.3. Fast Central Projection Refocused Image Rendering Based on Micro-Image

3. Feature Detection Based on CPSFS

3.1. Central Projection Stereo Refocused Image Pair with Parallax

3.2. Calculation of Plenoptic Disc Data Based on Two-Layer CPSFS

3.3. Calculation of Plenoptic Disc Data Based on Multi-Layer CPSFS

4. Experiments

4.1. Simulated Experiments

4.1.1. Comparison of Refocusing Algorithm

4.1.2. Relation between Layer Distance in CPSFS and Detected Results

4.1.3. Performance of Detection Algorithm Based on Multi-Layer CPSFS

4.1.4. Anti-Noise Capacity Test

4.2. Real Experiments

4.2.1. Feature Detection on Real Data

4.2.2. Example of Scene Reconstruction

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI