HORN-6 special-purpose clustered computing system for electroholography

: We developed the HORN-6 special-purpose computer for holography. We designed and constructed the HORN-6 board to handle an object image composed of one million points and constructed a cluster system composed of 16 HORN-6 boards. Using this HORN-6 cluster system, we succeeded in creating a computer-generated hologram of a three-dimensional image composed of 1,000,000 points at a rate of 1 frame per second, and a computer-generated hologram of an image composed of 100,000 points at a rate of 10 frames per second, which is near video rate, when the size of a computer-generated hologram is 1 , 920 × 1 , 080. The calculation speed is approximately 4,600 times faster than that of a personal computer with an Intel 3.4-GHz Pentium 4 CPU.


Introduction
Holography is a technique that may be used to achieve three-dimensional (3-D) television. This technique involves recording a 3-D image on the hologram using the interference of light and then reconstructing the image using diffracted light. Holograms generated by numeric calculation are referred to as computer-generated holograms (CGHs) [1]. Electroholography involves the reconstruction of a 3-D image using an electronic device such as a liquid crystal display (LCD) to display the hologram [2][3][4]. The animation of the 3-D image can be reconstructed by switching the CGH at high speed, and this study has been researched to the interactive holographic 3-D display by research groups of MIT Media Laboratory and so on [5][6][7][8].
We have developed the HORN (HOlographic ReconstructioN) special-purpose computer for calculating CGHs at high speed [9][10][11][12][13]. Using the fifth prototype, HORN-5 [14], we succeeded in making a CGH when a CGH was displayed on a LCD, the size of which is 1, 920 × 1, 080. Due to the hardware design and resources, however, HORN-5 has a limitation in that the number of points that make up the 3-D image is less than approximately 10,000 (≈ 2 14 ).
In the present study, we developed the HORN-6 board, a special-purpose computing board for holography, which is able to handle a 3-D image composed of one million points. The improvement over the HORN-5 board was mainly the addition of external memory modules in order to expand the 3-D image. In addition, we constructed the HORN-6 cluster system using 16 HORN-6 boards.
The remainder of the present paper is organized as follows. Section 2 describes an overview of the proposed electroholography system. Section 3 describes the hardware design and implementation of the HORN-6 board, including the improvement from the HORN-5 board. In addition, Section 3 describes the cluster system composed of parallelized HORN-6 boards. Section 4 describes the computational performance of the HORN-6 cluster system and the reconstruction of 3-D animations composed of 10,000, 50,000, and 100,000 points. Section 5 summarizes the results of the present study and discusses future research.

Computer-generated hologram
The light intensity of the CGH is evaluated using the following equations when the reference light is parallel light and the object is sufficiently distant from the CGH [15].
where I(x α , y α ) is the light intensity of the point (x α , y α ) on the hologram, (x j , y j , z j ) are the coordinates of the object image, N is the number of the points of the object image, A j is the amplitude of the object light, λ is the wavelength of the reference light, and r α j is the distance between the CGH and the object point. Equation (1) shows that the calculation cost of the CGH is proportional to M × N, where M is the size of the CGH. For example, the calculation cost is 200 billion calculations when N is 100,000 and M is two million. Then, we used an algorithm that can be calculated at high speed and that is suitable for the special-purpose computer [16]. This algorithm is described by the following equations: where the coordinate data x, y, and z is normalized at p, which is the dot pitch of the display device. The light intensity I(x α+n , y α ) on the hologram can be evaluated using Eqs. (3)-(6) from I(x α , y α ). In a word, the light intensity of the next point can be obtained by simply repeating the addition of Eq. (4). Figure 1 shows the optical system used for electroholography in the present study. The data of the object image is saved on a personal computer (PC). Initially, the calculator unit generates the CGH, which is displayed on the reflective LCD via the LCD controller. When the reference light is irradiated on the LCD, the 3-D image is reconstructed on the output lens. Figure 2 shows the 3-D image is reconstructed on the output lens. The left-hand image is the original image. The center image is the CGH, and the right-hand image is the reconstructed image. Note that the image shown here is composed of one million points as generated by the clustered HORN-6 system. An Aurora Systems ASI6201 was used as the reflective LCD. The size of the LCD is M = 1, 920 × 1, 080, and the dot pitch is p = 6.4μm. The distance between the LCD and the reconstruction image is 1 m, and the viewing angle is approximately 5 • . Four Xilinx XC2VP70-5FF1517C chips are installed on this board as a FPGA. This FPGA chip has a memory of 738 Kbytes, and the size of the logical circuit is equivalent to seven million gates. In addition, each FPGA chip for the calculation can be connected with a 256-Mbyte Double Data Rate-Synchronous Dynamic Random Access Memory (DDR-SDRAM).

HORN-6 board
Another chip, the Xilinx XC2V1000-5FG456C, is installed on this board as a FPGA. This FPGA chip has a memory of 90 Kbytes and a logical circuit size equivalent to one million gates. This chip connects the Peripheral Component Interconnect (PCI) local bus of the host PC to the four FPGA chips for the calculation. Figure 4 shows a block diagram of HORN-6. In order to more easily describe the hardware design, we herein show the FPGA chip for communication and only one FPGA chip for the calculation. Three FPGA chips used for the calculation are omitted in Fig. 4.

Special-purpose computer for electroholography, HORN-6
The PCI controller in the FPGA chip for the communication connects the PCI local bus of the PC to the FPGA chips for the calculation. The HORN controller is the top module in the Fig. 3. HORN-6  FPGA used for the calculation. This controller sends the instruction to begin calculation to the HORN core and controls the communications between each module, including the DDR-SDRAM controller. The HORN core is the module that calculates Eqs. (3)-(6). In HORN-6, a total of 320 calculation modules of Eqs. (3) and (4) are installed. The design concept for the calculation in HORN-6 is similar to that in HORN-5, which is described in detail in [14]. The communication circuit with the PCI local bus operates at 33 MHz and the computational circuit operates at 133 MHz in the FPGA used for the calculation.
The DDR-SDRAM controller is the module for communications between the FPGA chip for the calculation and the DDR-SDRAM. Communications with the DDR-SDRAM, which has a capacity of 256 Mbytes, is possible in HORN-6. As a result, the amount of memory that can be used is approximately 350 times that in the case of HORN-5. Since the HORN-5 system has no external memory, only 738 Kbytes of memory can be used for calculation inside the FPGA chips. In theory, an object image composed of 10 million points can be handled when the amount of memory is 256 Mbytes. Moreover, we designed and implemented the clock signal controller using a Digital Clocking Manager (DCM), because various clock signals are used to control the DDR-SDRAM.
We succeeded in handling large volumes of data in HORN-6 by the design and the implementation of the DDR-SDRAM controller. However, to increase the number of object points that HORN-6 can handle, we have to consider the error of the summation part in Eq. (3). Therefore, we increased the data length of the light intensity in the HORN core.
Moreover, we improved the HORN controller. It takes very long time to perform the calculation of a CGH in the HORN core after reading out data from DDR-SDRAM. Then, after a thousand points of coordinate data are read out from DDR-SDRAM first, a thousand points of coordinate data which are calculated in the next time is read out from DDR-SDRAM while the HORN core calculates the data which have already been read out. The communication time is concealed by the computing time of CGH because the communication time to read out data from DDR-SDRAM is shorter than the computing time in HORN-6. boards via each PC. For generating a CGH by parallel computation, the calculated area of the CGH is divided by the number of FPGA chips on the HORN-6 boards. Table 1 shows the specifications of the PCs of the cluster system. Table 2 shows the time for generating a CGH when using the HORN-6 cluster system. Moreover, the computing time using the PC only (3.4-GHz Pentium 4) and the computing time using the cluster system are compared in Table 2. The specifications of the PC are the same as those of the PC used for the calculation in Table 1. The size of the CGH is 1, 920 × 1, 080 (approximately two million), and the unit of time in Table 2 is seconds. And the computing time of Table 2 includes the transmission time of an object data from the host PC to each client PC, the computing time of a CGH and the transmission time of a CGH from each client PC to the host PC. Additionally, the transmission time of a CGH from each client PC to the host PC is  The computing time of the CGH for the object image composed of 100,000 points is approximately 0.1 seconds using 16 HORN-6 boards. The CGH was generated successfully with a speed-up of 4,600 times compared to using one PC (3.4-GHz Pentium 4) and with a speed-up of 13 times compared to using one HORN-6 board. Furthermore, we succeeded in generating a CGH of the object image composed of one million points in approximately 0.99 sec (see Fig.  2 or Fig. 10). Table 2 shows that the performance of the cluster system increased in proportion to the increase in the number of object points.

Computational performance
Moreover, Fig. 6 shows the graph of the frame rate of the HORN-6 cluster system when the number of object points changes. In our paper, the frame rate is defined as the reciprocal of the computing time of Table 2. And the frame rate dosen't include the time that elapses before displaying a CGH on a LCD Real-time reconstructions of approximately 50,000 points at 20 fps and of approximately 100,000 points at 10 fps are possible.   9 show Computer Graphics (CG) images and reconstructed movies of the object image composed of from 10,000 to 100,000 points. In these figures, the left-hand sides are CG images, and right-hand sides are movies reconstructed using electroholography. The source for the reference light is a He-Ne laser (λ = 632.8 nm). The size of the reconstructions is approximately 5 cm × 5 cm × 5 cm. We succeeded in reproducing clear reconstructed movies, and to observe the reconstructed movies directly is possible when the source for the reference light is a light-emitting diode. In addition, reproducing at a rate of 10 fps or more is possible for all of the reconstructed movies in Figs. 7-9.
Moreover, Fig. 10 shows the CG image and the reconstructed image of a object image composed of one million points using electroholography. This image is reconstructed from the CGH generated using HORN-6. We successfully reconstructed high-definition images using HORN-6.

Conclusion and future research
We developed HORN-6, which can handle object images composed of one million points, and the computational speed of 10 fps for reconstructing an object image composed 100,000 points using 16 HORN-6 boards.
In the future, we intend to improve the processing power of the system so as to reconstruct object images composed of one million points in real time. And we also intend to make it possible for HORN-6 to handle the 3-D image with occlusion.

Acknowledgments
The present research was supported in part by a Grant-in-Aid for JSPS Fellows (21·4148).