A method of single reference image based scene relighting

Graphical abstract Our proposed method can be divided into 4 steps, as shown in Fig. 1 (1) the input image is segmented to the material map using the method of Bell et al. [1]. Every pixel of the material map is assigned by a material label; (2) the reference image is warped to the structure of the input image by the patch match warping; (3) each channel of the input image and the reference is decomposed to large-scale layer and detail layer under material constrain; (4) the final relit results are obtained by composing the details of the input image and the large-scale layer of the warped reference image.

Our proposed method can be divided into 4 steps, as shown in Fig. 1 (1) the input image is segmented to the material map using the method of Bell et al. [1]. Every pixel of the material map is assigned by a material label; (2) the reference image is warped to the structure of the input image by the patch match warping; (3) each channel of the input image and the reference is decomposed to large-scale layer and detail layer under material constrain; (4) the final relit results are obtained by composing the details of the input image and the large-scale layer of the warped reference image. Different from some methods that rely on inferred geometry [2,3], the proposed method does not rely on inference geometry.
The input image is segmented according to the material of each pixel. We use the method of Bell et al. [1] to obtain material label of each pixel. We make the material segmentation because that in different material region, different relighting operations should be conducted. We select 9 sorts of materials, which often appear in outdoor scene images, as shown in Fig. 2. We recolor each pixel according to the material label to get the material map. The first and the third lines are the input images. The second and the forth lines are the corresponding material maps.
In face image relighting, the reference face image can be warped by face landmark detection/face alignment. However, in outdoor scene, we cannot find such similar structure easily. The outdoor scene contains multiple objects. Thus, we use the patch match method to warp the reference image to the input image, i.e. to align the reference and the input image. The patch match algorithm is similar as the method of Barnes et al. [5]. We use the neighbor patches whose best matched patches have already been found to improve matching result of current patch. The difference from Barnes et al. [5] is that we use 4 neighbor patches instead of 3 ones.
The basic idea is to find the most similar patch in the reference image to substitute the original patch in the input image. Two basic assumptions are made: (1) the matched patches of the neighbor patches in the input image are mostly neighbor; (2) large scale random search region may also contain matched patch.
We denote the input image as A and the reference image as B. The coordinate of a patch is represented as coordinate of the left up corner of the patch. The Nearest Neighbor Field (NNF) is defined as f, whose definition domain is the coordinates of all the patches in A. The value of the NNF is the offset of the coordinate of matched patch in B. We denote the coordinate of the original patch in A as a and the coordinate of the matched patch in B as b, then: The distance between the original patch and the matched patch is defined as DðvÞ, which describes the distance between the patch a in A and patch a þ v in B. The distance is computed by the Euclidean distance [6]. The warping method contains three steps: initialization, propagation and random search.
Initialization. The initial offset of each patch in A is randomized around the patch. Propagation. As assumed above, the matched patches of the neighbor patches in the input image are mostly neighbor. We use the neighbor patches whose best matched patches have already been found to improve matching result of current patch. The f ðxÀ1; yÞ, f ðx; y À 1Þ and f ðxÀ1; y À 1Þ are used: f ðx; yÞ ¼ minfDðf ðx; yÞÞ; Dðf ðxÀ1; yÞÞ; Dðf ðx; y À 1ÞÞ; Dðf ðx À 1; y À 1ÞÞg ð2Þ Random Search. As assumed above, large scale random search region may also contain matched patch. We use a search window whose size is declined exponentially.
where v 0 ¼ f ðx; yÞ, R i is a random point in [1,1][À1,1]. w is the max search radius. α is the declining rate of the radius. The warped results of some reference images are shown in Fig. 3. We use the WLS filter [7] to decompose image into large-scale layer and detail layer, which can be considered as the illumination component and non-illumination component. Using the large-scale layer of the warped reference to substitute the large scale layer of the input can produce the final relit result. The outdoor scene contains various objects with various materials. Thus for different material, different decomposition parameters should be used. Each channel l of the input image and the reference image is filtered to a large-scale layer s. The detail layer d is obtained by: The original WLS filter uses the same smoothness level over the whole image. When using the WLS filter for our scene relighting task, we need make regions with different materials with different smooth levels. Thus, we set different smoothness levels in regions with different materials. We modified the original WLS [7] as: Hðrs; rlÞ ¼ X where, jl À sj 2 is the data term, which is to let l and s as similar as possible, i.e., to minimize the distance between l and s. Hðrs; rlÞ is the regularization (smoothness) term, which makes s as smooth as possible, i.e. to minimize the partial derivative of s. p is the pixel of the image. α controls over the affinities by non-linearly scaling the gradients. Increasing α will result in sharper preserved edges. l is the balance factor between the data term and the smoothness term. Increasing l will produce   where, rl is the gradient of l. l m is the material map of l, and the gray is the gray value of l m : gray ¼ ðR Â 0:2989 þ G Â 0:587 þ B Â 0:114Þ ð8Þ The minimization of Eqs. (1) and (2) can be solved by the off-the-shell methods such as Lischinski [6]. At last, using the large-scale layer of the warped reference to substitute the large-scale layer of the input can produce the final relit result.