Alternative Specular Approach for real-time rendering optimized for higher performance

In this work two novel techniques are proposed for 3D lighting computed in real-time on dedicated video hardware (GPU). Classical techniques such as Phong specular reflections are computationally heavy when executed on budget hardware, performing poorly in real-time and reducing battery life. The proposed alternatives are defined in simpler terms yet produce realistically looking results similar to the classical techniques. Numerous experiments are provided implementing the proposed techniques in hardware running both on GPU and CPU. The provided performance benchmarks show that the proposed techniques boost the performance significantly on budget equipment. The experiments were made on many different computers both on 32-bit and 64-bit platforms using single-threaded and multi-threaded approaches to evaluate the real-time performance accurately.


Introduction
In computer software applications where interactive 3D scenes are rendered it is common to work with shaders running directly on the dedicated video hardware, specifically the graphics processing unit (GPU).Such applications include but not limited to video games, interactive simulations, entertainment, scientific experiments and medical support.In many cases the budget is limited so the applications must scale to hardware with different processing power, starting from budget mobile netbooks to powerful high-end workstations.In addition, every attempt is made in the applications to produce the 3D as much realistically looking as possible.In fact, in the movie industry a large set of high-end computer stations is used to create computergenerated video frames, a process which can take from few minutes to several days.In the video game industry, an application is commonly run on a personal computer or dedicated hardware (such as gaming console) that has limited computational capabilities.
During the last few years the computer industry evolved to assist these applications with powerful GPUs capable of doing heavy computations so that more complex 3D scenes can be rendered as fast as possible yet looking closely to the reality.
The rendering of 3D scenes is commonly made by using one of the two popular technologies, or APIs -Direct3D provided by Microsoft [1] and OpenGL managed by Khronos Group [2].In earlier years the rendering process was implemented in the majority by each of the APIs and the underlying hardware, but now it is possible to customize the rendering using shaders that are executed directly on GPU.In this work, highlevel shading language (HLSL) was used, a proprietary shader language developed by Microsoft for use with Direct3D API.OpenGL has similar technology called GLSL.Although shaders written in HLSL for Direct3D can be ported to GLSL, the process is not covered in this work.
In typical 3D lighting approaches, the   transformed into world space.The resulting light vector is normalized to make sure it is a unity vector.The angle described earlier assumes that the normal vector is also normalized.The diffuse component in this lighting technique can therefore be calculated as the following: where saturate() is a function in HLSL that clamps the result in [0, 1] range and is usually optimized for performance when the shader code is compiled to shader assembly.The diffuse color component can be calculated as: where C in is the source interpolated color either taken from the vertex or sampled from the texture.The fake specular reflection component is calculated using the same angle α as the following: where p 0 is the apparent reflection strength and k 0 is the strength adjustment parameter; both parameters are calibrated to produce the desired result that resembles Phong specular reflection.In the context of this work, the parameters were chosen as p 0 =0.2 and k 0 =0.5.
The final color component can be calculated as: It is important to note that this lighting technique works directly with angles between the surfaces and not the dot product itself as in the case of classical diffuse lighting and Phong reflections.This may impact performance to a lesser degree because of arccosine calculation but the result is more accurate.

Fast "Fake" specular technique
It was mentioned earlier that the proposed lighting technique is more accurate in diffuse component because it works with angles between the surfaces as opposed to raw dot product found in many classical lighting techniques.For the sake of performance the arccosine in the calculation can be dropped and the entire process can be made using dot products instead.The new diffuse component is calculated as: The diffuse color component is therefore calculated as: It can be seen in the above equation that it is similar to how diffuse component is calculated in the classical techniques.The fake specular component can be calculated in two steps: where ' 0 P is adjusted lighting component, max is the function that returns the maximum of two values in HLSL and γ' is an alpha calibration component that is used to tune up the produced result so it roughly matches the more accurate lighting version described earlier.In the context of this work, γ'=0.9.The new specular component can be calculated as: The final color equation simply combines the newly introduced components: It is evident that this technique is very simple in mathematical terms and can be calculated with minimal effort on GPU.

Experimental Results
The lighting techniques described earlier were subject to numerous experiments with throughout performance benchmarks.In the experiments many different computers were used ranging from ultra-mobile netbooks to high-performance workstations.The specification of all testing machines is provided on Table 1.Fast "fake" specular.

Eee8G
From the images on ¡Error!No se encuentra el origen de la referencia. it can be seen that although the general location of the specular reflection is different, all three images look similarly shining and produce an illusion of metallic look.Should an unprepared viewer watch each of these images, it would be difficult if not impossible to realize whether or not the real Phong shading is being used.In addition, it is important to note that the diffuse lighting used in "Fake" specular produces higher contrast and more metallic look; as it was mentioned earlier, the "Fake" specular uses more accurate diffuse approach.As it can be seen from the results on Table 2, the computers having weaker and cheaper video cards are in greater benefit of the proposed alternatives, while high-end video cards have little benefit.In some cases, up to 89% increase in performance is observed when using the fast "fake" specular technique.

Machine
In the second application, the different lighting techniques were implemented in a more advanced approach: per-pixel bump mapping in shaders.In this approach, a different P-Q Torus Knot model was rendered textured using brick-shaped metallic texture.The previous two applications were executed in shaders running directly on GPU.
However, in certain mobile devices the GPU acceleration is not available.Therefore, in the last application the different lighting techniques were implemented in software running directly on computer's CPU.The application was compiled in Embarcadero Delphi XE 2 with compiler optimizations enabled and all debugging information disabled.In total four different approaches were used: 1) singlethreaded approach on 32-bit platform, 2) single-threaded approach on 64-bit platform, 3) multi-threaded approach on 32-bit platform and 3) multi-threaded approach on 64-bit platform.
In the single-threaded approaches, the program calculated the final color for one million vertices.In the multi-threaded approaches, the program calculated the final color for a million of vertices in each of the 64 threads that were run simultaneously.The execution time was evaluated for each instance.The benchmarking results for singlethreaded variant are shown below.As it can be seen on Table 5, the performance increase from using the proposed techniques is also significant when multi-threading is used, although to a lesser degree (most likely due to cache contamination on CPUs with small cache).It is also important to note that multithreading greatly increases the performance in some cases.[2] OpenGL Architecture Review Board.

Advanced Lighting and Materials with
Shaders.Jones & Bartlett Publishers, 2004.
figure below.It can be observed on the Figure 1 that the diffuse lighting assumes the scattering of the

Figure 1 .
Figure 1.(a) A rough surface reflects light diffusely (b) A plane surface produces specular reflection.

Figure 2 .
Figure 2. (a) Geometry of reflection on a planar surface (b) Geometry of reflection on a curved surface.The implementation of shaders for diffuse lighting and specular reflections is discussed in great detail in the popular literature (e.g.[1] and [3]), commonly denoted as Phong lighting technique, although other lighting techniques exist [4].
Asus Eee PC 8G, Intel Celeron 630 Mhz, DDR2-570 Mhz, Intel GMA900 Eee10HE Asus Eee PC 1000HE, Intel Atom N280 1.66 Ghz, DDR2-667 Mhz, The specification of all testing machines with their abbreviations that are used as references in the experimental result tables.Three different applications were used in the experiments.The first one used HLSL shaders for rendering P-Q Torus Knot [11] filled with a single diffuse color (no texturing); the model has 16929 vertices and 32768 faces.The resolution of the rendered image was 512x512 with 8x multisampling, if supported by a certain video card.The model was chosen so that the polygon count is low and the vertex shader

Figure 3 .
Figure 3. P-Q (7-4) Torus Knot rendered using a) Phong reflections, b) "Fake" specular and c) The model has 10593 vertices and 20480 faces, again being few polygons so that the majority of work is made in pixel shader.The rendered image resolution was 512x512 with 8x multisampling.On desktop machines the 960x960 resolution was used instead.The resulting rendering is shown below.

Figure 4 .
Figure 4. P-Q (3-4) Torus Knot rendered using per-pixel bump mapping with a) Phong reflections, b) "Fake" specular and c) Fast "fake" specular.From the images on Figure 4 it can be seen that the proposed techniques produce realistically looking results which to an inexperienced viewer make look the same.The performance, however, when using the proposed techniques is a different story.It can be noted that the "fake" specular technique, which uses more accurate diffuse shading produces image with higher contrast and more metallic look.The performance results are shown below.
in some special circumstances (such as on 64bit CPUs) and faster in others (GPU, 32-bit CPUs) it produces realistic results.The fast "fake" lighting technique is drastically faster than the traditional Phong technique.A special LOD-based approach can be used to mix both one of the proposed alternatives and the classical Phong technique for a hybrid approach where distant objects use faster alternative and closer objects are rendered with a slower classical technique.In the majority of cases it is difficult to determine visually for an inexperienced viewer that the used technique is not a true Phong reflection; the only way to figure it out would be looking at the light's origin and then at object to see that the reflection actually goes back to the light's origin.The last issue can be possibly mediated by using two light origins per single light, one being the original position for diffuse component while another being calculated as the average between the viewer and light's origin to be used for specular component, simulating the moving reflection.

Table 2 .
Performance benchmarks for the first application with different shading techniques running directly on GPU using shaders.The values are specified in frames per second.

Table 3 .
Performance benchmarks for the second application with different shading techniques running directly on GPU using shaders illustrating per-pixel bump mapping.The values are specified in frames per second.