Indoor Visible-Light 3D Positioning System Based on GRU Neural Network

: With the continuous development of artiﬁcial intelligence technology, visible-light positioning (VLP) based on machine learning and deep learning algorithms has become a research hotspot for indoor positioning technology. To improve the accuracy of robot positioning, we established a three-dimensional (3D) positioning system of visible-light consisting of two LED lights and three photodetectors. In this system, three photodetectors are located on the robot’s head. We considered the impact of line-of-sight (LOS) and non-line-of-sight (NLOS) links on the received signals and used gated recurrent unit (GRU) neural networks to deal with nonlinearity in the system. To address the problem of poor stability during GRU network training, we used a learning rate attenuation strategy to improve the performance of the GRU network. The simulation results showed that the average positioning error of the system was 2.69 cm in a space of 4 m × 4 m × 3 m when only LOS links were considered and 2.66 cm when both LOS and NLOS links were considered with 95% of the positioning errors within 7.88 cm. For two-dimensional (2D) positioning with a ﬁxed positioning height, 80% of the positioning error was within 9.87 cm. This showed that the system had a high anti-interference ability, could achieve centimeter-level positioning accuracy, and met the requirements of robot indoor positioning.


Introduction
With the progress of human beings and the development of technology, the application scenarios of robots have become more complex and diversified, and robots need to complete more difficult and intelligent work. In order to improve the efficiency and performance of robots, the positioning and navigation of autonomous robots are essential. At present, wireless positioning technologies such as wireless local area networks (WLANs), Bluetooth, radio frequency identification (RFID), ZigBee, and ultra-wideband (UWB) are commonly used for indoor positioning [1][2][3][4][5], but these wireless technologies generally have disadvantages such as high electromagnetic radiation, high deployment costs, and low positioning accuracy [6]. Compared with these wireless technologies, visible-light has the advantages of abundant bandwidth resources, no electromagnetic pollution, and low equipment costs, and it can achieve lighting and positioning at the same time. As a new type of wireless positioning technology, visible-light positioning based on LED has become a research hotspot in the field of wireless positioning [7].
In recent years, with the development of artificial intelligence, machine learning and deep learning algorithms, with their strong self-learning and generalization abilities, have become able to provide accurate positioning results in the context of VLP, and increasing numbers of people have applied them to indoor visible-light positioning. Abu Bakar et al. [8] use a weighted k-nearest neighbor (WKNN) algorithm for localization in a fingerprint recognition technique based on received signal strength (RSS). The results show

System Model
The indoor visible-light localization model designed in this study is shown in Figure 1. The room size was set to 4 m × 4 m × 3 m, and the corner of the room was used as the origin to establish the Cartesian coordinate system of the space. We used two LEDs as transmitters, placed on the ceiling, and each LED sent signals of different frequencies.

System Model
The indoor visible-light localization model designed in this study is shown in Figure  1. The room size was set to 4 m × 4 m × 3 m, and the corner of the room was used as the origin to establish the Cartesian coordinate system of the space. We used two LEDs as transmitters, placed on the ceiling, and each LED sent signals of different frequencies. To fully receive the signal sent by the transmitter, we used three PDs as receivers, which were located in front of the robot's head, on the left at the rear, and on the right at the rear. The model structure of the robot head receiver is shown in Figure 2   To fully receive the signal sent by the transmitter, we used three PDs as receivers, which were located in front of the robot's head, on the left at the rear, and on the right at the rear. The model structure of the robot head receiver is shown in Figure 2, which represents the robot head as a hemispherical model, and the three PDs on the head and the top center point are equidistant. In this robot head receiver model, the top center point O was used as the test point; r is the radius of the hemisphere; l is the length of the arc between point O and PD i ; α i (i = 1, 2, 3) is the azimuth angle of PD i ; θ is the central angle of the arc between point O and PD i ; and β(0 < β < 90 • ) is the elevation angle of PD i , which can be expressed as the following.

System Model
The indoor visible-light localization model designed in this study is shown in Figure  1. The room size was set to 4 m × 4 m × 3 m, and the corner of the room was used as the origin to establish the Cartesian coordinate system of the space. We used two LEDs as transmitters, placed on the ceiling, and each LED sent signals of different frequencies. To fully receive the signal sent by the transmitter, we used three PDs as receivers, which were located in front of the robot's head, on the left at the rear, and on the right at the rear. The model structure of the robot head receiver is shown in Figure 2, which represents the robot head as a hemispherical model, and the three PDs on the head and the top center point are equidistant. In this robot head receiver model, the top center point    Therefore, the relationship between the position (x i , y i , z i ) of PD i and the position (x 0 , y 0 , z 0 ) of the top center point O is where L is the horizontal distance between point O and PD i , and H is the vertical distance between point O and PD i . L and H can be expressed as the following.

Channel Model
The indoor visible-light channel model is shown in Figure 3 for the direct link model and the reflected link model, respectively. For an LOS link model, the indoor optical signal transmission link is short, so the attenuation of the optical signal caused by absorption and scattering is small. However, for an NLOS link model, because the indoor walls, floors, and other objects with reflection characteristics cause the diffuse reflection of the optical signal, the optical signal transmission link becomes longer, increasing the attenuation of the optical signal. Therefore, we considered the transmission of optical signals through LOS and NLOS links. This not only conformed to the real-world environment but also allowed further study of the adverse effects of reflection on system performance, making the positioning system more reliable and practical.
where L is the horizontal distance between point O and PDi, and H is the vertical distance between point O and PDi. L and H can be expressed as the following.

Channel Model
The indoor visible-light channel model is shown in Figure 3 for the direct link model and the reflected link model, respectively. For an LOS link model, the indoor optical signal transmission link is short, so the attenuation of the optical signal caused by absorption and scattering is small. However, for an NLOS link model, because the indoor walls, floors, and other objects with reflection characteristics cause the diffuse reflection of the optical signal, the optical signal transmission link becomes longer, increasing the attenuation of the optical signal. Therefore, we considered the transmission of optical signals through LOS and NLOS links. This not only conformed to the real-world environment but also allowed further study of the adverse effects of reflection on system performance, making the positioning system more reliable and practical.  In the LOS link model, the relationship between the received power LOS P of the PD and the LED transmitted power t P can be expressed as [19].  In the LOS link model, the relationship between the received power P LOS of the PD and the LED transmitted power P t can be expressed as [19].
where H LOS (0) is the DC gain of the LOS link. Assuming that the LEDs obey the Lambert radiation model, H LOS (0) can be expressed as [20] where A PD is the effective receiving area of the PD; d is the distance from the PD to the LED; m is the Lambertian emission order; φ is the emission angle of the LED; T s (ψ) is the optical filter gain; g(ψ) is the gain of the optical concentrator; and ψ and ψ FOV are the incidence and field-of-view (FOV) angles of the PD, respectively. m and g(ψ) can be expressed as [21] m = − ln(2) ln(cos(φ 1/2 )) , where φ 1/2 is the semi-angle at half-power of the LED emitters, and n is the internal refractive index of the optical concentrator. In this paper, two LEDs placed on the ceiling were used as light sources, and three PDs placed on the hemispherical surface were used as receivers. Each PD had a certain inclination angle, and the radiating angle cosine of the LED and the incidence angle cosine of the inclined PD could be expressed as [22] cos where h is the vertical height of the LED in relation to the PD; → v PD_LED is the direction vector from the PD to the LED; and → n PD is the normal vector of the PD receiving surface, which can be expressed as → n PD = (cos(α r ) sin(β r ), sin(α r ) sin(β r ), cos(β r )), (11) where α r and β r are the azimuth and tilt angles of the PD, respectively. If the LED position coordinates were (x t , y t , z t ), and the PD position coordinates were (x r , y r , z r ), then from Equations (10) and (11) we could obtain the incidence angle cosine of the inclined PD to receive LED light as follows: In a primary reflective NLOS link, the relationship between the received power P NLOS of the PD and the LED transmitted power P t can be expressed as where H NLOS (0) is the DC gain of the primary reflected NLOS link, which can be expressed as [23] H where N indicates the number of all reflective walls divided by ∆A as the area element; ρ is the reflectivity of the wall; d 1j is the distance between the LED and the wall reflective element; d 2j is the distance between the wall reflective element and the PD; φ 1j is the LED emission angle; ψ 1j and φ 2j are the incidence and emission angles of the wall reflective element, respectively; and ψ 2j is the incidence angle of the PD. If the normal vector → n w,j of the wall reflecting element is → n w,j = cos α w,j sin β w,j , sin α w,j sin β w,j , cos β w,j , where α w,j and β w,j are the azimuth and tilt angles of the wall reflector element, respectively, then the cosine corresponding to φ 1j ,ψ 1j ,φ 2j , and ψ 2j can be expressed as cos ψ 1j = x t − x w,j cos α w,j sin β w,j + y t − y w,j sin α w,j sin β w,j + z t − z w,j cos β w,j d 1j , cos φ 2j = x r − x w,j cos α w,j sin β w,j + y r − y w,j sin α w,j sin β w,j + z r − z w,j cos β w,j d 2j , where h 1j is the vertical height of the LED in relation to the wall reflector element, and x w,j , y w,j , z w,j are the position coordinates of the wall reflector element. In the VLP system, each LED is installed in a vertical ceiling downward fashion, with its half-power half-angle set to 30 • , which means the amount of light that the ceiling receives directly from the LED bulb is limited. We design the robot's shell with a low-reflectivity material, so we do not take into account the reflection of the robot itself. The receiver is mounted on the robot's head, and the reflection from the floor is blocked by the robot. In addition, because the optical power reflected more than twice will be less than the noise power, it can be ignored [24]. In this study, only the primary reflection of the four walls of the room was considered, which can reduce the complexity of the light propagation path. This is simpler for VLP system design and implementation. Compared with multiple reflections, the transmission path stability of NLOS transmission is higher, and the signal quality and stability are relatively better. The received power P r of the PD during the transmission of the indoor LED light signal in the LOS link and NLOS link model could be expressed as [25] the following:

GRU Neural Network Model
As general recursion neural networks (RNNs) present the problems of long-term dependence and gradient explosion [26], Hochreiter and Schmidhuber proposed the long short-term memory (LSTM) neural network in 1997. This network contains input, forget, and output gates that control input, memory, and output values, respectively [27]. Therefore, the LSTM network can effectively solve the problem of gradient vanishing and gradient explosion and is highly effective for large-scale problem processing; thus, it is widely used. The GRU network was proposed by Kyunghyun Cho et al. in 2014. This is a highly effective variant of the LSTM network [28], and the basic GRU unit structure is shown in Figure 4.

GRU Neural Network Model
As general recursion neural networks (RNNs) present the problems of long-term dependence and gradient explosion [26], Hochreiter and Schmidhuber proposed the long short-term memory (LSTM) neural network in 1997. This network contains input, forget, and output gates that control input, memory, and output values, respectively [27]. Therefore, the LSTM network can effectively solve the problem of gradient vanishing and gradient explosion and is highly effective for large-scale problem processing; thus, it is widely used. The GRU network was proposed by Kyunghyun Cho et al. in 2014. This is a highly effective variant of the LSTM network [28], and the basic GRU unit structure is shown in Figure 4. In a classical GRU network, the forward propagation equation at moment t is as follows: In a classical GRU network, the forward propagation equation at moment t is as follows: where · and * denote matrix multiplication and matrix dot product, respectively; W rx , W rh , W zx , W zh , W hx , W hh , and W o are the hidden layer weights; b r , b z , b h , and b o are the hidden layer biases; x t is the input at moment t; h t−1 is the hidden layer output state at moment t − 1; r t and z t are the reset gate and update gate, respectively; h t is the candidate set state at moment t; h t is the hidden layer output state at moment t; y t is the output at moment t; and σ and tanh are activation functions. In general, σ is a sigmoid function, which can be expressed as and tanh is a tangent function, which can be expressed as As with LSTM networks, GRU networks can also overcome the long-term dependency problem of traditional RNNs; however, the GRU network integrates the input and forget gates of the LSTM network into a single update gate, so the only two gates in the GRU network are the reset and update gates. In Equation (21), the reset gate r t controls the extent to which the hidden layer output state h t−1 at moment t − 1 is passed to the candidate set h t at moment t. In Equation (22), the update gate z t determines the extent to which the output state h t−1 at moment t − 1 is carried to moment t. In Equation (23), the candidate set state h t uses the reset gate r t to store past information. This is because the output of the reset gate will proceed through the sigmoid function, and each element in its output matrix is between 0 and 1, so the reset gate will control the size of the gate opening; a value closer to 1 indicates that more information is memorized. In Equation (24), the update gate z t determines how much of the candidate set state information h t at moment t and h t−1 at moment t − 1 will be retained, and the retained information is used as the output state information h t of the hidden layer at moment t. For Equation (25), using the hidden layer output state h t at moment t as the output y t at moment t is generally straightforward, i.e., The output at time t is passed to time t + 1 to continue forward propagation as the input at time t + 1.
We compared the commonly used recurrent neural networks, employing identical parameter settings. As shown in Table 1, ensuring prediction accuracy, the model complexity of the GRU network is lower than that of the LSTM model, which not only reduces the training parameters, but also accelerates the network training time.

Construction of Fingerprint Database
The robot moves in an indoor space area, and the maximum height during its activities is uncertain. In this study, we took the average height of a person, 1.7 m, as the maximum height during robot activity. Therefore, a volume of 4 m × 4 m × 1.7 m in the room was used as the positioning space, divided into sections of 0.18 m × 0.18 m × 0.18 m. The four vertices of each small square area after division were used as reference points, the robot head receiver model was placed at each reference point, and the top center point coincided with the reference point. We used three PDs to acquire optical signals and then filtered them. Thus, we obtained two signals of different frequencies and calculated their optical power values. Finally, we recorded the optical power value and position coordinates obtained at the reference point in the fingerprint database. The fingerprint data at the k-th reference point can be expressed as: where P kij (i = 1, 2, 3; j = 1, 2) is the optical power value of the j-th LED light source received by the i-th PD at the k-th reference point, and (x k , y k , z k ) are the position coordinates at the k-th reference point. Therefore, the VLP fingerprint database F db could be constructed as where N is the number of reference points. After dividing the positioning space into 0.18 m × 0.18 m × 0.18 m sections, the data obtained at the reference point were used as the training set. In addition, the positioning space was divided into 0.24 m × 0.24 m × 0.24 m sections, and the data obtained at this reference point were used as the test set. The training set was used to train the network model and provide it with a predictive ability, and the test set was used to evaluate the performance of the trained network model.

Data Preprocessing
GRU neural networks are very sensitive to input data, so we needed to normalize the input data. This process involved mapping the input data onto the same dimension, so that data of different dimensions had equal importance in the network. This not only improved the speed of network convergence, but also eliminated the influence of dimensions on the final result. We normalized the input data using where x is the input data for the training set, x min is the minimum value of all input data in the training set, x max is the maximum value of all input data in the training set, and x norm is the normalized input data. In addition, the GRU network required three-dimensional tensor inputs, so the input data needed to be converted into three-dimensional tensors before they were fed into the network. The input of the network was the optical power data, so the power data needed to be converted into three-dimensional tensors. The converted k-th power data could be represented as I k = P k11 P k12 P k21 P k22 P k31 P k32 .
Then, the input data could be expressed as where n is the number of input data, and the shape of input data is (n, 3, 2).

Selection of Performance Indicators
We used the mean squared error (MSE) and root mean squared error (RMSE) to evaluate the performance of the GRU network and VLP models.
The loss and evaluation functions of the GRU network model used MSE, which could effectively represent the error between the predicted and actual output of the network. In the process of neural network training, the gradient obtained by the loss function was input into the optimizer for gradient descent, and then the network weight was updated by backpropagation. We repeatedly trained the network to continuously improve its predictive capabilities. Finally, the test set was substituted into the trained network model for evaluation, and the network performance was evaluated by MSE. The MSE was calculated as follows: where N is the number of sample sets, (x i , y i , z i ) are the true values of the i-th sample point of the sample set, and (x i ,ŷ i ,ẑ i ) are the predicted values of the i-th sample point of the sample set.
In the positioning process, the RMSE could better reflect the relationship between the predicted and true positions, so the RMSE was used to calculate the VLP error. The RMSE between the true and predicted coordinates of the k-th reference point could be expressed as where (x k , y k , z k ) are the true coordinates of the k-th reference point in the test set, and (x k ,ŷ k ,ẑ k ) are the predicted coordinates of the k-th reference point in the test set. Therefore, the average positioning error was

Building the GRU Network Model
We used the Python 3.9 compiler for the experiments and Tensorflow 2.6 and the Keras 2.6 deep learning framework to build the GRU network models. When building a network model, its initial weights are random, and so the predictions of the trained model differ each time. Therefore, in order to achieve reproducible experimental results, we had to fix the random seed before building the network model. In addition, in the process of network model construction, one must manually configure the number of GRU network layers and the number of neurons in the network layer. Furthermore, before training the network, one must also set the hyperparameters, such as the learning rate, number of iterations, and batch size. These parameters affect the complexity and performance of a model, so they need to be set appropriately. Below, we present the comparison and analysis of different hyperparameter values.
To explore the influence of the number of neurons on the accuracy of the model, we compared the values at intervals of eight.
As shown in Figure 5, the average positioning error was lower when the number of neurons in the GRU network layer was 24. However, the complexity of the model also increased when the number of neurons exceeded 24, and the average positioning error did not change significantly with an increase in the number of neurons. Therefore, the number of neurons in the GRU layer of the network model was set to 24. After settling on 24 network neurons, we analyzed the influence of the number of GRU network layers on the model performance.
From Table 2, one can see that the mean squared error and average localization error of the GRU network were smaller when the number of layers was two, and the model performance was improved. Furthermore, as the number of network layers increased, the error increased. When the number of layers is greater than two, increasing the number of layers of the network requires assigning more weights and training time to the network, which will lead to increased complexity of the network model and overfitting of the model, reducing the accuracy of the model. Therefore, we set the number of layers in the GRU network to two. The batch size is the number of samples selected for training at one time, and backpropagation is performed by calculating the gradient of these samples, so it affects the degree of optimization and speed of a model.
In this study, the compared batch sizes were 16, 32, 64, 128, and 256. From Table 3, one can see that when the batch size was too small, the gradient of calculation was unstable due to the paucity of samples, and the network did not easily converge, causing the model accuracy to decrease. However, the network generalization ability was reduced when the batch size was too large, though the network model error did not change significantly. Table 3 also shows that the training time decreased as the batch size increased. According to our comparative analysis, the model was more effective when the batch size was set to 128.  After settling on 24 network neurons, we analyzed the influence of the number of GRU network layers on the model performance.
From Table 2, one can see that the mean squared error and average localization error of the GRU network were smaller when the number of layers was two, and the model performance was improved. Furthermore, as the number of network layers increased, the error increased. When the number of layers is greater than two, increasing the number of layers of the network requires assigning more weights and training time to the network, which will lead to increased complexity of the network model and overfitting of the model, reducing the accuracy of the model. Therefore, we set the number of layers in the GRU network to two. The batch size is the number of samples selected for training at one time, and backpropagation is performed by calculating the gradient of these samples, so it affects the degree of optimization and speed of a model.
In this study, the compared batch sizes were 16, 32, 64, 128, and 256. From Table 3, one can see that when the batch size was too small, the gradient of calculation was unstable due to the paucity of samples, and the network did not easily converge, causing the model accuracy to decrease. However, the network generalization ability was reduced when the batch size was too large, though the network model error did not change significantly. Table 3 also shows that the training time decreased as the batch size increased. According to our comparative analysis, the model was more effective when the batch size was set to 128.  Table 4 shows the effect of the learning rate on the model performance. The model performance was more favorable when the learning rate was set to 0.01, and the decreasing curve of the network loss function is shown in Figure 6.  Table 4 shows the effect of the learning rate on the model performance. The model performance was more favorable when the learning rate was set to 0.01, and the decreasing curve of the network loss function is shown in Figure 6.  Figure 6 shows that when the number of iterations was around 950, the downward curve of the loss function was relatively flat, and there was no downward trend in subsequent iterations. To prevent overfitting and reduce training time, the maximum number of iterations of the network set to 950.
During network training, the gradient descent was slow when the learning rate was too small; thus, the training time needed to be increased to bring the model closer to the local optimum. However, the gradient decreased quickly when the learning rate was too large. Oscillation is easy in the later stage of training, but stabilization to local optimality is not straightforward, and gradient explosion may occur. In order to ensure that the network converged quickly at the beginning of training and more effectively at the end of training, we proposed a strategy to adjust the learning rate dynamically. Thus, the learning rate decay curve could be expressed as: where epoch is the iteration number of network training, and a , b, and c are set values, satisfying a > 0, b > 0, and c > 0. Here, a is the upper convergence boundary of the learning rate decay curve, and the value of   0 lr is closer to a . Therefore, a can be regarded as the initial learning rate. In this study, a = 0.01 was adopted. The value denoted as b is the inflection  Figure 6 shows that when the number of iterations was around 950, the downward curve of the loss function was relatively flat, and there was no downward trend in subsequent iterations. To prevent overfitting and reduce training time, the maximum number of iterations of the network set to 950.
During network training, the gradient descent was slow when the learning rate was too small; thus, the training time needed to be increased to bring the model closer to the local optimum. However, the gradient decreased quickly when the learning rate was too large. Oscillation is easy in the later stage of training, but stabilization to local optimality is not straightforward, and gradient explosion may occur. In order to ensure that the network converged quickly at the beginning of training and more effectively at the end of training, we proposed a strategy to adjust the learning rate dynamically. Thus, the learning rate decay curve could be expressed as: where epoch is the iteration number of network training, and a, b, and c are set values, satisfying a > 0, b > 0, and c > 0. Here, a is the upper convergence boundary of the learning rate decay curve, and the value of lr(0) is a/(1 + exp(−bc)) when epoch = 0. If exp(−bc) << 1, lr(0) is closer to a. Therefore, a can be regarded as the initial learning rate. In this study, a = 0.01 was adopted. The value denoted as b is the inflection point of the curve; lr is larger in the interval of epoch ∈ [0, b), so the gradient descent is faster and the network converges rapidly. Additionally, lr decreases continuously after epoch = b, so the gradient descent slows down, which effectively suppresses the gradient oscillation it the late training period, and the network is more easily stabilized to the local optimum. The component c is related to the decrease in the curve at the inflection point; the higher the value of c, the faster the curve falls at the inflection point. Based on continuous testing, the average positioning error was small when a = 0.01, b = 700, and c = 0.02, and the corresponding learning rate decay curve is shown in Figure 7. As shown in Table 5, the learning rate decay strategy proposed in this paper corresponded to a higher VLP system accuracy, indicating that the method was effective. Therefore, the GRU network model was constructed according to the parameters established above, and its structure is shown in Figure 8.  The model contained an input layer and three output layers, that is, the power data were input into the network, and the output comprised three coordinates. The hidden layer used three identical network structures, each containing two GRU network layers. In order to transform the data format of the GRU layer output into the final output data format, a dense layer was added before the output layer, and the network model parameters are shown in Table 6.  As shown in Table 5, the learning rate decay strategy proposed in this paper corresponded to a higher VLP system accuracy, indicating that the method was effective. Therefore, the GRU network model was constructed according to the parameters established above, and its structure is shown in Figure 8. As shown in Table 5, the learning rate decay strategy proposed in this paper corresponded to a higher VLP system accuracy, indicating that the method was effective. Therefore, the GRU network model was constructed according to the parameters established above, and its structure is shown in Figure 8.  The model contained an input layer and three output layers, that is, the power data were input into the network, and the output comprised three coordinates. The hidden layer used three identical network structures, each containing two GRU network layers. In order to transform the data format of the GRU layer output into the final output data format, a dense layer was added before the output layer, and the network model parameters are shown in Table 6.  The model contained an input layer and three output layers, that is, the power data were input into the network, and the output comprised three coordinates. The hidden layer used three identical network structures, each containing two GRU network layers. In order to transform the data format of the GRU layer output into the final output data format, a dense layer was added before the output layer, and the network model parameters are shown in Table 6.

Simulation Results and Analysis
To verify the localization performance of the proposed algorithm, a simulation environment was built according to the indoor visible-light localization model in Figure 1. We placed the hemispherical surface receiver model at each reference point in the positioning space and used three PDs on the hemispherical surface to acquire the signals sent by the two LEDs. The simulation parameters are shown in Table 7.  In the simulation, the LED emitted a cosine AC signal, and to ensure that the LED communicated while achieving normal lighting, we added a DC bias to the LED signal. At the receiving end, the phase of the AC signal received by the PD was related to the transmission path of the signal, and the phase of the received signal differed each iteration. To be realistic, a phase shift of kT was implemented for the LED emission signal in the simulation, where k ∈ [0, 1) is a randomly generated value and T is the LED emission signal period.
We obtained the simulated fingerprint data from the VLP model, and the sizes of the training and testing sets were 5290 and 2312, respectively. The training set was substituted into the GRU neural network to train the model, and after the training was completed, the testing set was substituted into the trained model to predict the position. The threedimensional positioning predicted using the GRU network model for the LOS link and LOS + NLOS link scenarios is shown in Figure 9.   Table 8 shows that the average localization error of the VLP model was 2.69 cm when only the LOS link case was considered, while the average localization error was 2.66 cm when both the LOS and NLOS link cases were considered. Figure 10 indicates that 95% of the positioning error was within 7.88 cm, showing that the model achieved centimeterlevel positioning accuracy and met the needs of indoor positioning for robots.

Link
Mean In the study, we used the same GRU network structure to make separate predictions for x, y, and z coordinates. To study the GRU network's prediction of x, y, and z coordinates, we analyze each coordinate error distribution separately. As can be seen from Figure 11, 90% of the errors in LOS + NLOS links are within 0.0265 m. Among them, the error in predicting the x-coordinate is the largest. As can be seen from Figure 1, the arrangement   Table 8 shows that the average localization error of the VLP model was 2.69 cm when only the LOS link case was considered, while the average localization error was 2.66 cm when both the LOS and NLOS link cases were considered. Figure 10 indicates that 95% of the positioning error was within 7.88 cm, showing that the model achieved centimeter-level positioning accuracy and met the needs of indoor positioning for robots.   Table 8 shows that the average localization error of the VLP model was 2.69 cm when only the LOS link case was considered, while the average localization error was 2.66 cm when both the LOS and NLOS link cases were considered. Figure 10 indicates that 95% of the positioning error was within 7.88 cm, showing that the model achieved centimeterlevel positioning accuracy and met the needs of indoor positioning for robots.

Link
Mean In the study, we used the same GRU network structure to make separate predictions for x, y, and z coordinates. To study the GRU network's prediction of x, y, and z coordinates, we analyze each coordinate error distribution separately. As can be seen from Figure 11, 90% of the errors in LOS + NLOS links are within 0.0265 m. Among them, the error in predicting the x-coordinate is the largest. As can be seen from Figure 1, the arrangement In the study, we used the same GRU network structure to make separate predictions for x, y, and z coordinates. To study the GRU network's prediction of x, y, and z coordinates, we analyze each coordinate error distribution separately. As can be seen from Figure 11, 90% of the errors in LOS + NLOS links are within 0.0265 m. Among them, the error in predicting the x-coordinate is the largest. As can be seen from Figure 1, the arrangement of LEDs in the x-axis direction has a greater influence on the optical signal received by the receiver.
of LEDs in the x-axis direction has a greater influence on the optical signal received by the receiver. To analyze the influence of the height on the accuracy of the model, we compared the two-dimensional positioning errors of the planes corresponding to different positioning heights. Table 9 shows the average and maximum positioning errors corresponding to the two-dimensional planes with the receiver placed at different heights under the LOS and LOS + NLOS link scenarios. When the positioning height was 0.24 m, the average positioning error of the model was the smallest for both LOS and LOS + NLOS links: the minimum values were 1.32 cm and 1.34 cm, respectively, and the maximum errors were 8.72 cm and 6.9 cm, respectively. However, when the positioning height was 1.68 m, the average positioning error of the model was the highest for both LOS and LOS + NLOS links, with maximum values of 7.75 cm and 7.84 cm, respectively, and maximum errors of 101.65 cm and 75.6 cm, respectively.  Figure 12 shows that 80% of the positioning errors were within 9.87 cm for different positioning heights under the LOS link and LOS + NLOS link scenarios, and 80% of the positioning errors were within 3.44 cm for positioning heights below 1.44 m. Moreover, the CDF curve of the positioning error produced by the proposed algorithm for the LOS and LOS + NLOS link scenarios was small, which indicated that the algorithm had a good generalization ability and robustness for locating different links. Therefore, we will only discuss the positioning results for LOS + NLOS links. To analyze the influence of the height on the accuracy of the model, we compared the two-dimensional positioning errors of the planes corresponding to different positioning heights. Table 9 shows the average and maximum positioning errors corresponding to the two-dimensional planes with the receiver placed at different heights under the LOS and LOS + NLOS link scenarios. When the positioning height was 0.24 m, the average positioning error of the model was the smallest for both LOS and LOS + NLOS links: the minimum values were 1.32 cm and 1.34 cm, respectively, and the maximum errors were 8.72 cm and 6.9 cm, respectively. However, when the positioning height was 1.68 m, the average positioning error of the model was the highest for both LOS and LOS + NLOS links, with maximum values of 7.75 cm and 7.84 cm, respectively, and maximum errors of 101.65 cm and 75.6 cm, respectively.  Figure 12 shows that 80% of the positioning errors were within 9.87 cm for different positioning heights under the LOS link and LOS + NLOS link scenarios, and 80% of the positioning errors were within 3.44 cm for positioning heights below 1.44 m. Moreover, the CDF curve of the positioning error produced by the proposed algorithm for the LOS and LOS + NLOS link scenarios was small, which indicated that the algorithm had a good generalization ability and robustness for locating different links. Therefore, we will only discuss the positioning results for LOS + NLOS links.    Figure 13 shows that when the positioning height was low, the errors were basically the same. When the positioning plane increased to a certain height, the positioning error also increased, and when the positioning height increased from 1.44 m to 1.68 m, this trend was more obvious. An analysis of Equations (15) and (27) (4) and (12), this led to the higher attenuation of the optical signal, thereby increasing the error of the optical signal received by the PD and reducing the positioning accuracy.  Figure 13 shows that when the positioning height was low, the errors were basically the same. When the positioning plane increased to a certain height, the positioning error also increased, and when the positioning height increased from 1.44 m to 1.68 m, this trend was more obvious. An analysis of Equations (15) and (27) reveals that the positioning error was mainly due to measurement errors related to the dc gain H LOS (0) and H NLOS (0) of the channel. When the positioning height increased, the emission angle of the LED light source also increased, and, according to Equations (4) and (12), this led to the higher attenuation of the optical signal, thereby increasing the error of the optical signal received by the PD and reducing the positioning accuracy.   Figure 13 shows that when the positioning height was low, the errors were basically the same. When the positioning plane increased to a certain height, the positioning error also increased, and when the positioning height increased from 1.44 m to 1.68 m, this trend was more obvious. An analysis of Equations (15) and (27) (4) and (12), this led to the higher attenuation of the optical signal, thereby increasing the error of the optical signal received by the PD and reducing the positioning accuracy.

Conclusions
We proposed an indoor visible-light three-dimensional positioning system based on a GRU neural network that solved the problem of the low positioning accuracy of existing robots. After the GRU network model was established, a learning rate attenuation strategy was proposed to improve the performance of the GRU network. A receiver placed on the robot's head was used to collect optical power data and then predict position coordinates from the trained GRU neural network. The experimental results showed that the average 3D positioning error was 2.69 cm when considering only LOS links, while the average error was 2.66 cm when considering LOS and NLOS links at the same time, and 95% of the positioning error was within 7.88 cm. For two-dimensional positioning with a fixed positioning height, 80% of the positioning error was within 9.87 cm. When the positioning height was 0.24 m, the average positioning error of the model under LOS and LOS + NLOS link scenarios was 1.32 cm and 1.34 cm, respectively. Therefore, the proposed method could achieve centimeter-level positioning accuracy to meet the needs of indoor robot positioning.
Author Contributions: L.Q.: conceptualization, investigation, supervision, resources, and writing (review). W.Y.: conceptualization, methodology, investigation, data curation, formal analysis, software, writing (original draft and editing), and validation. X.H. and D.Z.: resources and visualization. All authors have read and agreed to the published version of the manuscript.

Conclusions
We proposed an indoor visible-light three-dimensional positioning system based on a GRU neural network that solved the problem of the low positioning accuracy of existing robots. After the GRU network model was established, a learning rate attenuation strategy was proposed to improve the performance of the GRU network. A receiver placed on the robot's head was used to collect optical power data and then predict position coordinates from the trained GRU neural network. The experimental results showed that the average 3D positioning error was 2.69 cm when considering only LOS links, while the average error was 2.66 cm when considering LOS and NLOS links at the same time, and 95% of the positioning error was within 7.88 cm. For two-dimensional positioning with a fixed positioning height, 80% of the positioning error was within 9.87 cm. When the positioning height was 0.24 m, the average positioning error of the model under LOS and LOS + NLOS link scenarios was 1.32 cm and 1.34 cm, respectively. Therefore, the proposed method could achieve centimeter-level positioning accuracy to meet the needs of indoor robot positioning.