A planar compound eye based microsystem for high precision 3D perception

Three-dimensional (3D) panoramic vision system plays a fundamental role in the biological perception of external information, and naturally becomes a key system for embodied intelligence to interact with the outside world. A binocular vision system with rotating eyeball has long baseline, large volume and weak sensitivity to motion. A compound eye system has small volume, high sensitivity to motion but poor precision. Here, a planar compound eye microsystem for high precision 3D perception is proposed by combining semiconductor manufacturing process and biological compound eye structure. Using a semiconductor planar image sensor as the sensing unit, a space-coded planar sub-eye array is designed and its sub field of view (FOV) is dynamically mapped to the image sensor. It solves the problem that a traditional vision system cannot simultaneously accommodate wide FOV with long focal length and high sensitivity to motion with high resolution. The parallax among different sub-eyes enables the system to accurately perceive and dynamically track the 3D position of the target in the range of 10 m and within the FOV of 120 ° in a single compound eye. This system is of great significance in the fields of intelligent robot and intelligent perception.


Introduction
Visual information perception is an important sign of biological evolution.Through the acquisition of visual information, animals can have a more detailed and accurate grasp and dynamic prediction of external information, and achieve more accurate predation, attack and avoidance [1].This significantly promotes the development of biological intelligence.Some animals in nature, especially the large animals represented by human beings, have a binocular vision system composed of a pair of eyeballs [2].The light emitted or reflected by the target passes through the cornea and lens, refracts and then converges on the retina.The imaging parallax between the two eyes can be used for three-dimensional (3D) perception, positioning and tracking of targets.It realizes the 3D panoramic perception of the outside world through the rotation of the eyeball and neck, and realizes the perception of near or far targets by adjusting the curvature of the lens [3].However, its motion structure is complex, the baseline between the two eyes is long, resulting in large volume, and a large amount of information needs to be processed to obtain the accurate position of the target.Some animals, such as fish, realize the wide field of view (FOV) perception through the fisheye system [1,4].Such systems have a gradient index spherical lens that does not require eye movement for panoramic perception.But the distortion at the edge of the FOV is too large, and the accuracy of the perceptual information is poor.The shape of the fish eye lens cannot be changed, so it can only perceive nearby targets [5].At the same time, most animals in nature, such as insects and crustaceans, achieve 3D panoramic perception through lightweight compound eye structures [6,7].The compound eye is a multi-view stereo vision system composed of a series of ommatidia with different directions.The ommatidia with curved surface distribution can realize the acquisition of information under the panoramic FOV.The imaging resolution of compound eye is determined by the number of ommatidia, it requires low computing power and has strong dynamic perception ability.Besides, the small aperture of the ommatidium grants the compound eye system with the imaging advantage of infinite depth of field [8].However, short focal length and short baseline between ommatidia lead to low perception accuracy and short 3D perception distance range.
Inspired by the structure of biological compound eyes and combined with the existing photoelectric detection technology, a series of artificial compound eyes were developed by combining biotechnology and information technology.Artificial compound eyes can be divided into two categories according to their functions: one for two-dimensional (2D) image recovery and reconstruction [9][10][11][12], and the other for 3D target perception and tracking.In the existing studies of artificial compound eyes used for 3D perception of targets, the compound eye optical system is composed of curved sub-eyes to acquire the information in panoramic FOV like insects.The machining of curved optical systems is not compatible with current planar semiconductor processes, so its manufacture is difficult [13].Some of these systems have curved detectors modeled on insect neurons [9,14].By using flexible electronic manufacturing technology, the sensing unit of the detector is matched to the sub-eye one by one, and the information like Mosaic with low sampling rate is obtained.Others use a common planar image sensor as sensing device [15][16][17][18][19].However, because it does not match the curved optical system [20], it is necessary to add complex relay system or waveguide system [15,16,21].As for principles of target perception, there are two types of existing research.The first is to code the light intensity distribution of the image of the target in different positions at the sensing unit to form the corresponding relationship between the code and the target position [15][16][17][18][22][23][24][25].However, the image spot morphology is closely related to the shape and light intensity of the target, so this code is not universal.The second is to find the intersection of the corresponding light incident vector of different sub-eyes by the least square method to determine the 3D position of the target [26][27][28][29].However, the short focal length of the sub-eye leads to inaccurate measurement of the light incident vector, which affects the perception accuracy of the system and leads to a small perception range.The measurement distance range of the existing compound eye systems is from tens of millimeters to hundreds of millimeters, similar to that of insects (hundreds of millimeters) [30,31].Given that some systems have a baseline much larger than the insect compound eye, the distance-baseline-ratio of existing 3D compound eye measurement systems is relatively small.
Here, a compound eye microsystem that combines the biological compound eye structure and the planar semiconductor process is developed.In this microsystem, the code of the acquired information is moved forward from the sensing unit to the optical system, and a planar compound eye optical system with spacing coded sub-eyes is designed which can be manufactured by conventional micro-electromechanical systems (MEMS) processing technology.The targets in different FOV areas are modulated by the coded sub-eyes and are imaged on the same plane image sensor.The microsystem obtains 3D position information of the target by parallax of different pairs of sub-eyes, and realizes position perception of the meter-level distance under the millimeter-level baseline.The planar optical system breaks through the mutual constraint between FOV and resolution in traditional binocular vision systems by using the principle of long focal length sub-eyes and image sensor multiplexing.Besides, the sub-eye code rules and sub-eye aperture shapes of the compound eye optical system are optimized to achieve better perceptual performance.The microsystem has small size, large FOV, large measurement distance range, large distance-baseline-ratio and low computing power requirements.The work provides an accurate 3D perception measurement method and measurement system for micro embodied intelligent robots and unmanned aerial vehicles (UAV).The target 3D position information provided by the microsystem for the embodied intelligent robot and UAV has broad application value in visual 3D positioning, target tracking, obstacle avoidance, formation and other fields, as shown in Figs 1 (a) and (d).

3D perception principle
The microsystem consists of a planar compound eye optical system with coded sub-eyes and a planar image sensor, as shown in Fig. 1 (b).The light emitted or reflected by a close-range target (the angle of the target relative to the sub-eye is less than 1/10 of the sub-eye's FOV) passes through the optical system and forms an array of coded image, which are received by the image sensor.Using the measurement principle of multiplexing image sensor [32], the area of the optical system is much larger than the area of the image sensor, so there are targets imaging on the image sensor in a wide FOV.Since the sub-eyes on the optical system are arranged according to the determined code, the code of the target image under different orientation is different.According to the code, the matching relationship between the collected image and the coded sub-eyes can be carried out.The morphology of the target on the image sensor is a diffuse light spot covering several pixels, and the resolution beyond the pixel size of the image sensor can be achieved by determining the centroid of the light spot [33], similar to the concept of hyperacuity in biology [34].
When the target to be measured is infinitely far away, the incident light is parallel light, and each pair of matched sub-eye and target image has the same direction vector.At this time, the collected information does not contain 3D position information, but only contains 2D orientation information of the target in the microsystem coordinate system, which is consistent with the previous article [32].When the target to be measured is within a finite distance, because the propagating light of the target is manifested as a spherical wave instead of a plane wave, the direction vector of each pair of matched subeye and target image are different, which contains 3D position information of the target.Dispersive light will cause a difference between the imaging distance and sub-eyes aperture distance, as shown in Fig. 1 (c).According to the parallax principle, the coordinates of the target P(x p , y p , z p ) at finite distance can be expressed as (see Supplementary Note 1 for specific formula derivation) : (1) Where f is the focal length of the microsystem, n is the number of target image collected in a single measurement, x si and y si ,i∈1,…,n is the x and y coordinates of the collected target images centroid, x ai and y ai ,i∈1,…,n is the x and y coordinates of the sub-eye apertures center corresponding to the collected target images, and k and l are the number of pairs in the x and y directions selected in different solving algorithms.
Based on the similar triangles in Fig. 1 (c), ratio R is defined as follows: For the target with determinate position, R of any area of the whole collected image is a fixed value, which is only related to the depth information z p of the target and the focal length f of the system.Without considering the error, R can be obtained from any pair of matched target image and sub-eye coordinates.Combined with the Equation (2), the coordinates of P(x p , y p , z p ) can be expressed as:

Microsystem design
There are many optical structures that can realize large FOV perception, and the two common structures are fisheye system and compound eye system.Both the fisheye system and the traditional compound eye system introduce light from various incident angles in the environment to the image sensor through an optical device without direction specificity.At the sensing unit, the incident light in different directions can be distinguished by coding the position of the target image and its light intensity distribution.The planar compound eye 3D perception microsystem combines the optical structure of the compound eye with the planar image sensor like the retinal.Different from the hexagonal uniform Mosaic structure of biological compound eyes, the position and distance of the sub-eyes are coded, and the code is moved forward from the sensing unit to the optical system.The image from each sub-eye appears on the entire image sensor plane depending on the angle of the incident light.The sensing unit only collects information without code ensuring the multiplexing characteristics of the photosensitive pixel, as shown in Fig. 2 (a).The code of the geometric position of the sub-eye aperture is more precise and with higher resolution than the code of the presence, light or shade of the target image, and the code value is almost unaffected by the shape and light intensity of the target, so it provides more accurate information for 3D perception.
In the process of 3D perception of microsystems, the arrangement of sub-eyes directly affects the matching recognition between sub-eyes and target images, and then affects the accuracy of perception.Considering the complexity of the calculation, the sub-eyes of the compound eye optical system are arranged according to the grid shape, so that the 2D sub-eye apertures array can be compressed into two one-dimensional (1D) coded vectors, which greatly reduces the computing power requirements, as shown in Fig. 2 (b).As shown from the imaging principle of this system in Fig. 2 (a), when the light enters at a certain angle, part of the sub-eyes on the compound eye optical system is imaged to the image sensor according to the projection relationship.Taking the x direction as an example, the obtained x coordinates of the centroid of the target images can be used to calculate the imaging distance of the x direction (collect x1 ,……, collect xm ).Then, computation is performed by matching the imaging distance with the sub-eyes aperture distance in the x direction (code x1 ,……, code xn ).Specifically, the ratio between the imaging distance and the sub-eyes aperture distance is computed, and the standard deviation is determined.After sliding operation, the matching values of different positions are obtained.The matching function of target image and sub-eye aperture should have similar orthogonal characteristics.The matching value of the best matched point should be clearly distinguished from the matching value of the non-matched point, so as to ensure the recognition of the system has strong fault tolerance.Based on this principle, the code (see Materials and methods) of compound eye optical system is designed with the inspiration of orthogonal coding method in the field of communication using genetic algorithm [35].As shown in Fig. 2 (b), the matching value is 0 only at the best matched point, while the matching value at other points is larger, and the distinction is obvious.The y direction selects the same code as the x direction (i.e.code yi = code xi , i=1,…,n).Two orthogonal 1D vectors can be extended into a grid distributed sub-eye apertures array on compound eye optical system.The average value of the ratio between the imaging distance and the sub-eyes aperture distance is R. R can be used to compute the 3D position Limited by the diffraction of the sub-eye aperture, the target image is a superposition of a group of diffraction spots from different incident angles.Energy concentration of the diffraction spot directly affects the centering precision, and thus the measurement precision.It is necessary to optimize the shape of the sub-eye aperture to ensure that the diffraction spot energy of the light at a specific incident angle is as concentrated as possible, so as to improve the imaging quality of the target.The aperture of the sub-eye has been optimized in the previous work [32], but it is optimized for the main incidence angle (the line connecting the center of the sub-eye aperture and the center of the image sensor).As can be seen from Fig. 3, the sub-eye has a wide FOV (see Supplementary Table 2).The optimization only for the main incidence angle will lead to defocus of the target at the edge of the sub-eye FOV.And the imaging quality between the two sides of the image sensor is obviously different (see Figure S14), which will further affect the centering precision of the diffraction spots at the edge of the sub-eye FOV.Therefore, it is necessary to optimize the aperture size according to the FOV of different sub-eyes.In order to simplify the analysis, a rectangular aperture is selected as the shape of the sub-eye.It is assumed that the incident light has a FOV that changes with the position of the sub-eye only in the plane formed by one side of the aperture (l p ) and the optical axis, and a fixed FOV for the other side (l v ) (see Figs. 3 (a) and (c)).Fresnel-Kirchhoff diffraction formula (see Equation ( 4)) is used to simulate the diffraction spots of light under different conditions (see Materials and methods).Under the conditions of 7 mm focal length and 5 mm×5 mm image sensor size, the effect of l v direction FOV (~±20 °) on the change of spot dispersion area is less than 10%, which can be ignored, as shown in Fig. 3 (b).The FOV in the l p direction has a great influence on the dispersion area of the diffraction spot, as shown in Fig. 3 (c).With the increase of the FOV range and the range closer to the edge of FOV, the effect is more severe (see Fig. 3 (d) and Supplementary Note 2).(The influence of the focal length of the sub-eye and the wavelength of the incident light is shown in Supplementary Note 3).The size of l p is selected so that the average and standard deviation of the diffraction spot area within the range of the subeye FOV are considered to be the minimum (see Supplementary Note 2), and the size of l v is selected to take the l p value of the sub-eye located at the optical axis.In this way, the imaging quality of each sub-eye in the FOV is more uniform, ensuring the consistency of the spot centering precision, and then the measurement precision of the system is guaranteed and improved.For sub-eyes with different distances from the optical axis, the corresponding aperture size design value can be obtained according to its FOV (see Supplementary Table 3).
According to the above analysis, the designed compound eye optical system, as shown in Fig. 4 (a), has 2879 sub-eyes.The number of sub-eyes is equivalent to the number of ommatidia of common insects [7].MEMS processing technology is used for processing and manufacturing (see Materials and methods and Fig. 4 (b)).Combined with an image sensor (2048×2048 pixel @ 2.4 μm), relevant electronic processing circuit, mechanical structure, etc., a compound eye 3D perception microsystem was assembled, as shown in Figs 4 (c) and (d), with a focal length of 7 mm and a conical FOV of 120 °.

Experiment for 3D perception
Relevant experimental equipment was built to test the 3D perception ability of the microsystem, as shown in Fig. 5 (a).The microsystem was placed on a rotating platform, and a luminous target (The light source is white broad-spectrum with illuminance of 100 lx and size of ~10 cm.) was placed on the guide rail and can be moved to any position in a large distance range.The spherical wave of the target irradiated to the microsystem, and the coded pattern (as shown in Fig. 5 (b)) was collected by the image sensor and solved by the 3D position determination algorithm (see Equation ( 3)) .The precise moving guide rail (positioning accuracy < 0.1 mm) provided the truth distance value for reference.The target was placed at different distances from the microsystem (from 500 mm to 10000 mm, with a step of 500 mm), and the microsystem is rotated at different angles using the rotating platform (from 0 ° to 60 °, with a step of 20 °) to test the accuracy of different spatial positions in the half FOV.Considering the symmetry and equivalence of the microsystem, the design of the x and y directions of the microsystem is the same, and the measurement principle of the positive and negative directions of the x/y axis is also the same, so the experiment can represent the measurement performance of the microsystem in the entire conical FOV.After 100 frames of images were collected at each position, the average value was as the measurement result and the standard deviation was used for precision evaluation.The experimental results, as shown in Fig. 5 (c) and (d), prove that the compound eye microsystem can realize 3D perception and measurement of the target within a distance range of 10m and a conical FOV of 120 °.The average positioning error of the full FOV is 4.86% within 5 m, and 7.32% within 10 m.As the measuring distance becomes longer, the measurement error increases (see Supplementary Note 4 for specific measurement results data).It should be noted that the sensing unit of the image sensor is about 5 mm (because only the target image on the sensing unit can participate in the calculation), so the maximum size of the multiaperture system' baseline is 5 mm.The measuring distance of the microsystem can reach 10 m, so the distance-baseline-ratio is more than 2000.Next, the target is placed at a certain distance from the microsystem, and a small scale step movement (step is 10 mm) is made to observe the resolution of the microsystem.For example, at the position of 3000 mm, the fitting step of the microsystem solution result is 10.83 mm as shown in Fig. 5 (e), which is similar to the set value.It verifies that the microsystem has the resolution ability at the close positions.And with the linear increase of the measurement distance within a certain range, the microsystem measurement result also increases linearly.Finally, an insect-shaped target is moved with horizontal path within the FOV of the microsystem, as shown in Fig. 5 (f ).The test result obtained is shown in Fig. 5 (g), which is consistent with the movement trajectory (See the Supplementary Movie for an intuitive display).The above experiments prove that the microsystem has the ability to locate the space target with high precision in a wide FOV and large distance range.
Disregarding spatial constraints of the laboratory, this microsystem also can actually measure longer distances possibly.Of course, the measurement error also increases with the measurement distance, because the relationship between sub-eye imaging parallax and measurement distance is nonlinear.In order to achieve higher positioning accuracy within the specified range, the instrument can be calibrated by measuring a target at a known spatial location before using (see Supplementary Note 5).The imaging of target details in the system is limited by sub-eye aperture diffraction, which can be used to achieve clearer imaging and wider target recognition by embedding microlens and metalens into the sub-eye aperture.It can cooperate with cooperative targets with infrared spectral light source, and realize the relative distance measurement in the hidden environment.The system only computes the centroid coordinates of the target images, which is simple in calculation and requires low computing power.The measurement update rate is mainly limited by the image sensor frame rate, and the image sensor with higher frame rate can achieve higher sensitivity of motion.

Conclusion
The existing 3D perception in scientific research and industry is mainly realized through binocular stereo vision [36].Their baseline size directly affects the measurement distance and measurement accuracy [37], so the volume is usually big, and its FOV is small limited by the long focal length.The instruments that can achieve a large distance range and wide-FOV 3D measurement and perception, such as scanning Lidar [38,39] and laser tracker [40,41], generally require laser transmitting devices, mechanical rotating devices, laser receiving devices, etc.Its volume is large and its structure is complex.In this paper, a compound eye microsystem is developed which can perceive the 3D position of the target within a conical FOV of 120 ° and the distance range of 10 m.Its size is small, and the distance-baseline-ratio exceeds 2000, which is far higher than that in the existing literature.Just like binocular vision system, this microsystem also adopts parallax measurement principle.However, because the compound eye measurement system has ~3000 sub-eyes, the average result of ~100 sub-eyes participating in each calculation reduces the random error.
The random error is theoretically reduced to 1/ √ 50 of the binocular vision system with the same focal length and baseline parameter, improving the precision of the measurement.The image sensor multiplexing principle makes the compound eye microsystem have a much wider FOV than the binocular vision system.Compared with the existing compound eye 3D perception system, the design of the long focal length sub-eye ensures the accuracy of the light incident vector measurement and greatly improves the perception accuracy and range.
Compared with the previous work [32], the two microsystem use the same instrument architecture.The previous article achieves the orientation measurement of infinite distance (far away enough) target with wide FOV and high resolution.This paper proposes a 3D spatial position measurement of finite distance target, and is an important supplement and extension of the previous article.In addition, the orthogonal aperture coded design and the optimization method for the aperture size according to the sub-eye FOV are proposed in this paper, so that the system can further improve the redundancy of the algorithm, arithmetic speed and quality of image, while maintains the advantages of the previous work.Further, according to the estimation of sub-eye imaging parallax and R, the system can realize the switch of infinite distance 2D orientation perception and finite distance 3D position perception, and realize the perception and measurement of orientation and spatial coordinates of various geometric quantities of targets in infinite distances and finite distances.This paper provides a 3D perception method with high distance-baseline-ratio, which can be widely used in the fields of interaction between embodied intelligence and environment, visual positioning navigation, unmanned aerial vehicles obstacle avoidance, target acquisition, unmanned system formation, etc., after being adjusted accordingly for specific applications.

The design process of sub-eye code values by genetic algorithm
Genetic algorithm is a highly parallel, stochastic and adaptive algorithm for global optimization based on natural selection and evolution in biology [42].The specific steps of using genetic algorithm to optimize the sub-eye arrangement template are: a) parameter coded, b) initial population generation, c) fitness function computation, d) selection/replication, e) crossover, and f) mutation.Iterate steps c) to f) until convergence conditions are reached.In this paper, the minimum value of the matching value of the non-matched points is selected as the fitness function to ensure the orthogonality of the sub-eye aperture coding.The above process can be directly implemented in Matlab's Optimization Tool.

Imaging simulation of sub-eye aperture
The target image formed by a sub-eye aperture is a diffraction image following the Fresnel-Kirchhoff diffraction formula, see Equation (4).For near-parallel light, the complex amplitude of diffracted light wave becomes [32]: where A is a constant related to the intensity of the target, λ is the wavelength of the light wave, k is the wavenumber, x 0 and y 0 are the horizontal and vertical coordinates of the integral surface element dσ in the aperture region, α and β denote two direction cosines of the incident light, the meanings of vectors n, r and l shown in Figure S18, and r is the norm of r.
For a sub-eye aperture at a certain distance from the optical axis, the FOV in x and y directions can be calculated by the Equation ( 5) and Equation ( 6) [32].
Where, (x a ,y a ) is the x and y coordinates of the center of the sub-eye aperture, l sensor is the size of the image sensor sensing unit, and f is the focal length of the microsystem.The calculated value is taken into the Equation (4) as the incident angle of the light.Combined with the setting of other parameters, the simulation image under different parameters can be obtained.At the same time, the area corresponding to 80% energy of the diffraction spot (called EE80) is selected as the evaluation standard [43], and the quality of the image can be assessed quantitatively.

Processing and manufacturing of compound eye optical system
The planar compound eye optical system is machined by MEMS process as same as the previous article [32], as shown in Fig. 4 (b).The substrate of the MEMS aperture array is quartz glass, which is cleaned first.And a layer of chromium with a thickness of 100 nm is plated on the surface of the glass.Through the MEMS mask fabrication process, the photoresist is coated on the chromium-plated quartz substrate and exposed by the laser or electron beam according to the design of the compound eye optical system.After the exposed photoresist is removed, the chromium layer is exposed and removed by etching.At this point, the etched part allows light to pass through as sub-eye, while the other part rejects light.After resist stripping, the required planar compound eye optical system can be obtained.The machining method is compatible with semiconductor machining processes and does not require complex 3D etching or integration processes.

Fig. 1
Fig. 1 Principle and application of microsystem.a Schematic diagram of the use of a compound eye microsystem on embodied intelligent robots.b Microsystem architecture and multiplexing image sensor principle.c Principle of 3D position measurement.d Schematic diagram of application of compound eye microsystem in target perception, tracking, formation and other scenarios

Fig. 2
Fig. 2 Coding and layout design of sub-eyes.a Perception principles of different systems.b Schematic diagram of 2D code and 1D matching operation, where STD is the standard deviation function, n is the number of sub-eyes aperture distance in the x and y directions and m is the number of imaging distance in the x and y directions

Fig. 3
Fig. 3 Optimal design of sub-eye aperture.a Simulation of the influence of the FOV (-21 °~21 °) in the l v direction on the diffraction spot when the incident angle is kept 50 ° in the l p direction.The parameters used in the simulation are: l p =0.122 mm, l v =0.063 mm, wavelength is 532 nm.b Under different incident angle in l p direction, the effect on the diffraction spot by the FOV range in the l v direction; c Simulation of the effect on the diffraction spot by the FOV range (38 °~56 °) in the l p direction, when the incident angle is kept 0 °in the l v direction.The parameters used in the simulation are: l p =0.122 mm, l v =0.063 mm and wavelength is 532 nm.d The diffraction spot area (EE80, see Materials and methods) corresponding to the aperture size 0.07 mm~0.2 mm in the FOV range of 32 °~59°

Fig. 4
Fig. 4 Implementation of compound eye 3D perception system.a The compound eye optical system.b MEMS process to manufacture the compound eye optical system.c and d is structure diagram and physical diagram of compound eye 3D perception system respectively

Fig. 5
Fig. 5 Compound eye 3D perception experiment and results.a Drawings of experimental equipment.b Imaging of insect-shaped targets and triangular targets.c is the positioning results of different positions in the full FOV.d is the comparison between the Euclidean distance from target to the microsystem and the set value.e Test results of small scale stepping motion.f and g are the perception experiment diagram and perception result of the target with horizontal path respectively