Implantable metaverse with retinal prostheses and bionic vision processing

: We present an implantable metaverse featuring retinal prostheses in association with bionic vision processing. Unlike conventional retinal prostheses, whose electrodes are spaced equidistantly, our solution is to rearrange the electrodes to match the distribution of ganglion cells. To naturally imitate the human vision, a scheme of bionic vision processing is developed. On top of a three-dimensional eye model, our bionic vision processing is able to visualize the monocular image, binocular image fusion, and parallax-induced depth map.

Following in the footsteps of Neal Stephenson-the time setting of his version of metaverse is still in the 21st century-let us try to move the timeline a bit further.Say a time when a cyborg is no longer a science fiction.Futuristic or scary as it may sound, this is happening now.If to look it up in the dictionary, cyborg is usually defined as a being with both organic and biomechatronic or bionic body parts [15].Say someone having an artificial cochlea or a cardiac pacemaker.By this definition, he/she is more or less a cyborg.As our life expectancy keeps increasing, it might not be a bad idea to replace the dysfunctional body parts with the artificial ones.Inspired by this scenario, we hereby introduce an implantable metaverse/AR/VR, which consists of two major components, i.e., retinal prostheses and bionic vision processing.

Retinal prostheses
Retinal prostheses, also known as bionic eyes, are an implantable electronic device that could restore the sensation of vision for the individuals with retinal diseases such as retinitis pigmentosa or age-related macular degeneration [16].To compensate the loss of photoreceptors in the outer layer of retina, i.e., rod cells or rods and cone cells or cones, retinal prostheses employ arrays of micro electrodes or photodiodes to provide the electric stimulation to the remaining cells.As the retina is composed of as many as 10 distinct layers [17], retinal prostheses could be mounted at different locations.In most clinical trials, retinal prostheses are placed epiretinally (on the retina) or subretinally (behind the retina) [18].We prefer to design with the former out of two main reasons.Number one.Epiretinal placement allows prostheses to bypass all other retinal layers to directly stimulate the ganglion cells.Number two.The number of ganglion cells or ganglions (0.7 to 1.5 million) is far less than both rods (92 million) and cones (4.6 million) [19].Similar to rods and cones, ganglions are also unevenly distributed throughout the retina.In the fovea, the ratio of photoreceptors to ganglions is as small as 5.In the periphery, this ratio could go up to hundreds.By leveraging this property, we arrange the electrodes to proportionally match the density of ganglions, forming a foveated pattern, as shown Fig. 1.As opposed to photoreceptor-based foveation techniques [20][21][22], ganglion-based foveation could significantly reduce the abundance of pixels, thereby translating into fewer electrodes and lower power consumption.

First principle
To explain how the human vision works, a diagram of simplified visual pathway [23] is drawn in Fig. 2. The whole process of vision can be condensed into four steps.Step 1. Formation of monocular images of both left and right visual fields or fields of vision (FOVs).As distinguished by blue and red colors, the left/right half of left/right FOV is projected to the right/left hemiretina of left/right eye.Step 2. Transmission of visual information or images via the optic nerve, optic chiasm, optic tract, lateral geniculate nucleus, and optic radiation.Step 3. Fusion of monocular images into a binocular image on the visual cortex.Step 4. Derivation of depth of binocular field-the overlap of left and right FOVs.The purpose of our bionic vision processing is to emulate and visualize the above steps.In order for computer-generated images to be correctly or better interpreted by the brain, it is necessary to process the images in a way analogous to the innate visual processing.Otherwise, the physiological rejection or the VR sickness [24] may incur.Motivated by Dai Vernon [25] and his quote "What the eyes see, the heart must believe.", the key of bionic vision processing is to "fool" the brain.To meet this principle, the images must be natural.

Image plane adjustment
For human vision, say there is an object being defined by a plane ABCD, then its image projected onto the retina will be a spherical surface A'B'C'D', as shown in Fig. 3, where the eyeball is approximated as a sphere, and for the sake of symmetry, the object, image, pupil and fovea are center aligned.Note that the orientation of image shall be opposite to that of object.For bionic vision, as all optical components preceding the retinal prosthesis are bypassed, both the object and image need to be generated by the computer.In order for the computer to simulate the imaging more efficiently, a three-dimensional (3D) object is first projected onto the plane ABCD, and then the image plane is shifted from the surface A'B'C'D' to the plane A''B''C''D'', which is tangent to the eyeball.Otherwise, the region of surface A'B'C'D' will be subject to the change of object distance.
Fig. 3. Geometry of object plane, eyeball, and image plane.The eyeball is approximated as a sphere.For the sake of symmetry, the object, image, pupil and fovea are center aligned.
Note that the orientation of image shall be opposite to that of object.

Field of vision
As shown in Fig. 4, where the angle is measured in degrees, in the horizontal direction, monocular FOV extends to 60°nasally (towards the nose) and to 100°temporally (towards the temple).In the vertical direction, monocular FOV extends to 60°superiorly (towards the forehead) and to 75°inferiorly (towards the chin) [26][27][28].The binocular FOV-the overlap between left and right monocular FOVs-is 120°(horizontal) and 134°(vertical), respectively.The blind spot or scotoma-5.5°(horizontal)by 7.5°(vertical)-is located 12-15°temporally and 1.5°inferiorly [29].By transforming FOV from the polar coordinate to image coordinate, as shown in Fig. 5, masks for image manipulations could be obtained.

Foveated blurring
To demonstrate the foveated blurring with respect to the densities of ganglions, different kernel sizes for convolutions shall be adopted, as listed in Table 1.According to the fitted ganglion distribution function [30], the average ganglion density at fovea is about 90 times higher than that at the periphery.Pursuant to the rule that the kernel size is inversely proportional to the ganglion density, the foveal region uses a 1 × 1 convolution kernel, i.e., to keep the original pixels intact, while the outermost region uses 9 × 9 convolution kernel, i.e., to calculate the mean value out of 9 × 9 pixels.Figure 6 shows the image with four regions being blurred with 1 × 1, 3 × 3, 5 × 5, and 9 × 9 convolution kernels, respectively.

Spherical distortion
When projecting the retinal image to the target imaging plane, it is obvious that the FOV cannot approach 180°, which will make the image infinitely large.To avoid this situation, the maximum horizontal/vertical FOV is limited to 164°, which is wide enough to cover the entire range of FOV of eye.From the geometries shown in Fig. 7 and Fig. 8, we could easily calculate the coordinate transformation.As shown in Fig. 7, A is the pupil, O is the center of the eye, and B is the fovea.
If we assume that a point C' is on the eyeball and C is its corresponding point in the horizontal  It should be emphasized that each component of the changed coordinate is negative, when it corresponds to the negative half axis of the coordinate axis.From the geometric relationship, we where FOV h and FOV v represent the maximum horizontal and vertical FOVs of the image, respectively.In Python, PyVista is employed to build a 3D eyeball model and to accurately project the planar or flat image to its spherical surface.

Visual field transition
Revisiting the visual pathway, as shown in Fig. 2, we shall mainly focus on how the visual fields or images evolve from the retina to cortex.On the retina, the image therein will be reversed to be opposite to its original orientation, i.e., upside down and left-to-right.In addition, the  monocular visual field is split into two hemifields, which are separated by the blind spot.On the cortex, the image therein will be re-reversed back to its original orientation, which we actually perceive.Interestingly, since the retina of each eye is connected to both left and right cerebral hemispheres, there could be two types of the so-called binocular images.Type A refers to the unilateral binocular image, which is handled by one side of hemisphere and comprises two hemifields of the same side (left or right) of both eyes.Type B refers to the bilateral binocular image, which is coordinated by both hemispheres and comprises two full monocular visual fields of both eyes.Although two types of binocular images are coexisting inside the brain, we can only see the latter unless the visual pathway to one hemisphere is completely damaged [23].

Binocular disparity
The depth perception of human vision can be attributed to a variety of monocular and binocular cues [31].For the binocular cues, the binocular disparity or parallax is supposed to be the most decisive one.The depth induced by the binocular disparity can be deduced from an epipolar geometry [32], as shown in Fig. 9, where P denotes the point of interest, O L the center of left retinal plane, and O R the center of right retinal plane.When the left and right retinal planes are rectified, the depth d of point P can be written as where b is the center distance between the left and right retinal planes, f the focal length of eye, x 1 and x 2 the x-coordinates of projections of P onto the left and right retinal planes, respectively.
In the case of parallel eyes or lines of sight, b will become the interpupillary distance and f the diameter of eyeball.

Input images
To create input images for the retinal prostheses, our bionic vision processing is developed with Unity (Unity Software Inc.).Two physical cameras are used to mimic the eyes to view the distant objects.The focal length-the distance between the camera lens and the sensor-is 24 mm, which approximates the diameter of eyeball [33].The sensor size is 179 mm (width) by 179 mm (height).Hence, the camera's FOV is 150°(horizontal) by 150°(vertical).Considering the angular span of monocular FOV as discussed earlier, the bisector of horizontal FOV of camera shall be roughly center-aligned with the blind spot.The distance between two cameras is set as 63 mm, on par with the average interpupillary distance for an adult [34].In analogy to eyes, the Unity cameras capture the images in 3D scenes and flatten them to display, as shown in Fig. 10.

3D eye model
On the spherical surface of a 3D eye model are rendered the retinal images, which have factored into the monocular FOV, foveated blurring, and spherical distortion.Since two eyes are identical, only the left eye is shown in Fig. 11, where the dark spot and empty area of the eye model represent the blind spot and pupil, respectively.As compared to the traditional eye models [34], this 3D eye model is rotatable and scalable for a 360°view of retinal image (see Visualization 1).

Binocular image fusion
As shown in Fig. 12, the unilateral image fusion is performed by splitting the monocular images into the left and right hemiretinal images according to the location of blind spot (Fig. 12

Parallax-induced depth map
For the depth perception is color independent and generated in the cortex, the calculation of depth shall be carried out with monochromatic upright images.Firstly, both the left-eye (Fig. 14(a)) and right-eye (Fig. 14(b)) images are rectified with the epipolar corrections.Secondly, the binocular overlap of the rectified left-eye (Fig. 14(c)) and rectified right-eye (Fig. 14(d)) images is obtained, as shown in Fig. 14(e).Finally, for the case of parallel eyes when viewing distant objects, Fig. 14(f) shows the parallax-induced depth map, which is based on the foregoing binocular disparity.It should be mentioned that the false matching points in our results could be decreased not just by optimizing the algorithms for matching, but by tweaking the Unity cameras for more realistic images.

Conclusions
A design of retinal prosthesis in conjunction with bionic vision processing scheme has been conceptually studied.The main contributions of this work are summarized as follows.Contribution 1.A paradigm shift in the field of metaverse/AR/VR for ushering in an implantable device.Contribution 2. Ganglion-based foveation by patterning the electrodes with respect to the density of ganglions rather than photoreceptors.Contribution 3. Bionic vision processing, which is capable of visualizing the spherical distortion, foveated blurring, binocular image fusion, and parallax-induced depth.Contribution 4.An interactive 3D eye model with the retinal images being rendered on the eyeball.Admittedly, retinal prostheses and other implantable devices might not have the chance to be popularized in our time.But believe it or not, this technology will be turned into a reality someday.
Disclosures.The authors declare no conflicts of interest.

Fig. 1 .
Fig. 1.Structure of the proposed retinal prosthesis, of which the stimulation electrodes are arranged to proportionally match the density of ganglions, forming a foveated pattern.As opposed to photoreceptor-based foveation techniques, ganglion-based foveation could significantly reduce the abundance of pixels, thereby translating into fewer electrodes and lower power consumption.

Fig. 2 .
Fig. 2. Diagram of simplified visual pathway.The whole process of vision can be condensed into four steps.Step 1. Formation of monocular images of both left and right visual fields or FOVs.As distinguished by blue and red colors, the left/right half of left/right FOV is projected to the right/left hemiretina of left/right eye.Step 2. Transmission of images via the optic nerve, optic chiasm, optic tract, lateral geniculate nucleus, and optic radiation.Step 3. Fusion of monocular images into a binocular image on the visual cortex.Step 4. Derivation of depth of binocular field-the overlap of left and right FOVs.

Fig. 5 .
Fig. 5. FOV in the transformed image coordinate, whose axes are measured by the relative length.The area enclosed within the blue/red contour denotes the FOV of left/right eye.

Fig. 7 .
Fig. 7. Geometry to transform the point C' on the spherical surface to the point C on the flatten image.

Fig. 8 .
Fig.8.Geometry to convert the XYZ coordinate system into the UV coordinate system.In the XYZ coordinate system, the upward green arrow indicates the viewing direction of the image, and the y-axis is aligned with the upward direction in the FOV.

Fig. 9 .
Fig. 9. Epipolar geometry for calculating the depth induced by the binocular disparity, where P denotes the point of interest, O L the center of left retinal plane, O R the center of right retinal plane, b the center distance between the left and right retinal planes, f the focal length of eye, x 1 and x 2 the x-coordinates of projections of P onto the left and right retinal planes, respectively.

Fig. 10 .
Fig. 10.Images captured by the (a) left and (b) right cameras in Unity.The FOV of each camera is 150°(horizontal) by 150°(vertical).The distance between two cameras is set as 63 mm.

Fig. 11 .
Fig. 11.3D eye model with retinal images being on the spherical surface, on which the dark spot and empty area represent the (a) blind spot and (b) pupil, respectively.

Fig. 12 .
Fig. 12. Unilateral binocular image fusion.(a) Retinal image of left eye, (b) retinal image of right eye, (c) rectified images of left hemiretinas of both eyes before fusion, (d) rectified images of right hemiretinas of both eyes before fusion, (e) unilateral image with two right hemifields of both eyes being merged on the left cerebral hemisphere, and (f) unilateral image with two left hemifields of both eyes being merged on the right cerebral hemisphere.

Fig. 13 .Fig. 14 .
Fig. 13.Bilateral binocular image fusion.(a) Left unilateral image with two right hemifields of both eyes being merged on the left cerebral hemisphere, (b) right unilateral image with two left hemifields of both eyes being merged on the right cerebral hemisphere, and (c) bilateral image with left and right visual fields being merged by both cerebral hemispheres.