Updating analysis of key performance indicators of 4G LTE network with the prediction of missing values of critical network parameters based on experimental data from a dense urban environment

In practice, field measurements often show missing data due to several dynamic factors. However, the complete data about a given environment is key to characterizing the radio features of the terrain for a high quality of service. In order to address this problem, field data were collected from a dense urban environment, and the missing parameters were predicted using the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) algorithm. The field measurement was taken around Victoria Island and Ikoyi in Lagos, Nigeria. The test equipment comprises a Global Positioning System (GPS) and a Fourth Generation (4G) Long Term Evolution (LTE) modem equipped with a 2×2 MIMO antenna, employing 64 Quadrature Amplitude Modulation (QAM). The Modem was installed on a personal computer and assembled inside a test vehicle driven at a near-constant speed of 30 km/h to minimize possible Doppler effects. Specifically, the test equipment records 67 LTE parameters at 1 s intervals, including the time and coordinates of the mobile station. Thirty-two parameters were logged at 42,498 instances corresponding to 11 h, 48 min and 18 s of data logging on the mobile terminal. Sixteen important 4G LTE parameters were extracted and analyzed. The statistical errors were calculated when the missing values were exempted from the analyses and when the missing values were incorporated using the PCHIP algorithm. In particular, this update paper estimated the missing values of critical network parameters using the PCHIP algorithm, which was not covered in the original article. Also, the error statistics between the data (histograms) and the corresponding probability density function curves for the measured data with missing values and the data filled with the missing values using the PCHIP algorithm are derived. Additionally, the accuracy of the PCHIP algorithm was analysed using standard statistical error analysis. More network parameters have been tested in the update article than in the original article, presenting only basic statistics and fewer network parameters. Overall, results indicate that only the parameters which measure the throughput values follow the half-normal distribution while others follow the normal distribution.


a b s t r a c t
In practice, field measurements often show missing data due to several dynamic factors. However, the complete data about a given environment is key to characterizing the radio features of the terrain for a high quality of service. In order to address this problem, field data were collected from a dense urban environment, and the missing parameters were predicted using the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) algorithm. The field measurement was taken around Victoria Island and Ikoyi in Lagos, Nigeria. The test equipment comprises a Global Positioning System (GPS) and a Fourth Generation (4G) Long Term Evolution (LTE) modem equipped with a 2 ×2 MIMO antenna, employing 64 Quadrature Amplitude Modulation (QAM). The Modem was installed on a personal computer and assembled inside a test vehicle driven at a near-constant speed of 30 km/h to minimize possible Doppler effects. Specifically, the test equipment records 67 LTE parameters at 1 s intervals, including the time and coordinates of the mobile station. Thirty-two parameters were logged at 42,498 instances corresponding to 11 h, 48 min and 18 s of data logging on the mobile terminal. Sixteen important 4G LTE parameters were extracted and analyzed. The statistical errors were calculated when the missing values were exempted from the analyses and when the missing values were incorporated using the PCHIP algorithm. In particular, this update paper estimated the missing values of critical network parameters using the PCHIP algorithm, which was not covered in the original article. Also, the error statistics between the data (histograms) and the corresponding probability density function curves for the measured data with missing values and the data filled with the missing values using the PCHIP algorithm are derived. Additionally, the accuracy of the PCHIP algorithm was analysed using standard statistical error analysis. More network parameters have been tested in the update article than in the original article, presenting only basic statistics and fewer network parameters. Overall, results indicate that only the parameters which measure the throughput values follow the half-normal distribution while others follow the normal distribution.  Table   Subject  Engineering and Technology  Specific subject area  Wireless Communications Engineering  Type of data  Table  Chart Graph Figure  How the data were acquired The data analyzed in this article were acquired through a drive test. A 4G LTE modem was installed on a computer and a Global Position System (GPS) were assembled in a vehicle and driven at 30 km/h. The measured data were logged at the one-second interval.

Value of the Data
• The original data considered only six key performance indicators (KPIs) obtained from three sites, whereas sixteen KPIs have been tested in the updated data collected from seven eN-odeBs. The updated data is robust and will aid in efficient network design and planning to ensure high quality of service for real-time wireless applications. • The new data analysis provides a method of estimating missing values for different 4G LTE network parameters such as the RSRP, RSRQ, RSSI and others for the benefit of mobile subscribers and all parties in the wireless ecosystem.

Data Description
Radio propagation measurements of key performance indicators (KPI) in a typical wireless communication network are critical to assessing the quality of service (QoS) of a functional wireless network [1][2][3] . It is pretty challenging to obtain all measurements parameters with complete details in practice [4] . Some parameters are often not logged or missing from actual measurements due to a significant distance between the transmitter and receiver and other dynamic environmental factors. In order to estimate the missing values of these parameters, we employ the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) algorithm [5][6][7][8] .
Generally, LTE systems have eNodeBs that communicate with the user equipment (UE) [9][10][11] . Transferring data from eNodeBs to UEs is known as downlink transmission while moving data from UEs to the eNodeBs is known as uplink transmission [12] . There has been an increased deployment of real-time applications such as virtual meetings applications due to the impact of Covid-19 [13 , 14] . However, these applications require good QoS. In order to provide good quality of service for mobile subscribers, it is essential to determine the key parameters influencing the QoS [15][16][17] . To this end, this article analyses KPI parameters such as RSRQ, RSRP, RSSI, and thirteen others, as seen in the tested dataset [18] . Specifically, the dataset captured includes the This update paper focuses on estimating the missing values for each network parameter and evaluating the PCHIP algorithm used to predict the missing values via statistical error analysis. It is worth mentioning that only three site locations were tested in the original article, whereas seven eNodeBs have been investigated in the current paper. The existing article considered only six key performance indicators (KPIs), whereas several KPIs up to sixteen have been tested in the update article. Additionally, the updated report extracted and analysed sixteen important 4G LTE parameters. The methods used to produce the data in the update article slightly differ from the methods used to create the data in the related data article. Here, the data are logged at 1 s intervals, time-stamped, and thirty-two parameters were recorded every second, including the logging time and coordinates. Measured data were logged for a total number of forty-two thousand, four hundred and ninety-eight instances. This extensive measurement campaign produced better results than the limited logging methods applied in the original article. Also, the extensive logging and coverage indicate that the update data greatly complements the existing dataset. However, the new data do not invalidate the original dataset but show remarkable additional value.
Regarding the measurement equipment, a newer version of the 4G LTE Modem has been used in the updated measurement due to its fast processing capabilities. The new Modem has a higher upload speed and faster download processing time. Also, the new Modem is built with the Balong 50 0 0 chipset, supporting carrier aggregation and enabling a 5G measurement campaign. The theoretical peak download speed of the Huawei Modem used in the updated article is doubled, reaching up to 3.6 Gbps compared to the one used in the initial measurements with LTE download speed up to 100 Mbit/s and LTE upload speed up to 50 Mbit/s. Other measurement tools used in the original experiment were maintained. The acquired data in the update article were analysed using MATLAB 2020a, whereas MATLAB 2018a was used in the initial analysis. The new features in the new MATLAB version also help simplify and fasten data processing.
The original article did not consider the missing values of key network parameters, which have been included in the update article. Specifically, we estimated the missing data using the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) algorithm, which was not covered in the original article. Also, we derived the error statistics between the data (histograms) and the corresponding probability density function curves for the measured data with missing values and the data where the missing values are filled using the PCHIP algorithm. Again, this aspect was not considered in the original article. Additionally, in the updated paper, the accuracy of the PCHIP algorithm was analysed using standard statistical error analysis, and more network parameters have been incorporated in the current investigation. These parameters could be further analysed by investigating the outliers of the processed, filled missing values and either exempting those values from the inquiry or further interpolating those outlier values. Again, this analysis extends the original article, presenting only basic statistics and fewer network parameters.
The KPI data would help in evaluating the performance of the network. The data would be very valuable to the network operators and the regulatory agencies for informed decision making. The data would help develop and test efficient algorithms to study critical performance indicators for emerging wireless communication systems. Also, the wireless community and all parties in the communication ecosystem will find the projected data useful for learning-based algorithmic development, network planning, design, implementation, optimization and management.

Drive Test and Data Exploration
The test vehicle was driven at 30 km/h, and field measurements were taken at 1 s intervals.
The longitude and latitude information shows the route covered, as shown in Fig. 1 . It is shown that the covered routes are Victoria Island (VI) and Ikoyi, Lagos, Nigeria. These are places where corporate headquarters of multinational and national companies are located. Sixteen (16) key measurements are made at one-second intervals during a 30 km/h vehicular movement of the UEs. The total duration of data measurement is 11 h, 48 min and 18 s. This time gives a total of 42,498 instances of each parameter. However, some instances returned no values. Table 1 summarises the parameters measured, the number of missing values and the summary of the existing instances, excluding the missing values.

Statistical Characteristics
The statistical characteristics of the measured data from different positions of the UEs measured at the one-second interval at 30 km/h vehicular speed are presented in Tables 2-9 . Specifically, Table 2 shows the Reference Signal Received Power and Reference Signal Received Quality. In Table 3 , the Received Signal Strength Indicator and Primary Component Carrier Signal-to-Interference-and-Noise Ratio are highlighted. Table 4 shows the Physical Cell Identity and E-UTRA Absolute Radio Frequency Channel Number. The 1st Secondary Component Carrier RSRP and SCC1 RSRQ are shown in Table 5 . In addition, Table 6 shows the SCC1 RSSI and SCC1 PCI. Table 7 presents SCC1 SINR and SCC1 DL EARFCN. The PCC Physical Uplink Shared Channel Power and PCC Physical Uplink Control Channel Power are given in Table 8 . Finally, the Packet Data Convergence Protocol Throughput for the Downlink and Radio Link Control Throughput DL are shown in Table 9 .

Explanation of the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP)
The PCHIP is a distinctive third-degree piecewise polynomial function with robust shapepreserving characteristics than cubic splines. The peculiar robust shape-preserving feature makes the PCHIP an attractive technique for detailed dataset curve fitting and analysis in this paper.   (1) and (2) . The accuracy of the PCHIP algorithm is then analyzed using statistical error analysis. These analyses could also be further enhanced by investigating the outliers of the processed, filled missing values and either exempting those values from the investigations or further interpolating those outlier values. A detailed description of the PCHIP algorithm and the associated equations is provided in the following literature [5][6][7][8] . (1) In data analysis, the application of cubic splines to interpolate a time series can result in unrealistic overshoots, which is undesirable in practice. More often, when there is an increment in the independent variable resulting in remarkable variations between successive samples, unrealistic overshoots occur. In order to address this problem, we employed the PCHIP, which can overcome unrealistic overshoots. The data points strictly bound the PCHIP interpolant by its design. Here, the cubic polynomial between a pair of tested data points in the PCHIP algorithm is derived using the data values at these points and the selected values of the derivatives at the specified data points. A critical examination of the specific data point aids in choosing the value of the derivative at a given data point and the data point to its left and right. In summary, our justification for using the PCHIP method lies in its capability to deal with unrealistic overshoots.

Statistical Error Analyses
The Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) algorithm was used in predicting the missing values for each parameter. Here, the statistical error with missing values exempted and with missing values filled with the PCHIP algorithm are compared. The statistical errors of interest are the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Mean Squared Error (MSE) [19 , 20] . Table 10 shows the MAE and the RMSE analyses, and Table 11 shows the RAE and the MSE analyses. It can be seen that predicting/estimating the missing values for the measured parameters using the PCHIP algorithm Table 10 Results showing the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) between the data (histograms) and the corresponding probability density function (pdf) curves for both the measured data with missing values and the data where the missing data are filled using the PCHIP algorithm.

Probability Distribution
In order to emphasize the importance of data preprocessing in 4G LTE data analysis, the normalized histograms of the original measured data with missing values and the histograms when all the missing data are filled are plotted. The probability density function is used for the normalization [21] . After that, the distribution of the histogram is examined, and the corresponding probability density function (pdf) curves for the distribution are drawn for both the measured initially data with missing values and the corresponding filled data of the same parameters. For instance, the pdf for normal distribution is given in Eq. (3) . Similarly, the pdf for half-normal distribution is shown in Eq. (4) .
where σ is the standard deviation, μ is the mean, and σ 2 describes the variance.  Fig. 19 and Fig. 20 show the histograms and pdf curves for the RSRP, RSRQ and RSSI data, respectively. The PCC SINR and PCI data are shown in the histograms and pdf curves of Fig. 21 and Fig. 22 , respectively. Figs. 23 and 24 show the SCC1 PCI and SCC1 SINR, respectively. The PCC PUSCH power and PCC PUCCH power data are presented graphically in Fig. 25 and Fig. 26 , respectively. Finally, Fig. 27 and Fig. 28 show graphical representations of PDCP Throughput DL and RLC Throughput DL, respectively.
Figs. 2-17 show the measured data from the UEs' locations. The missing instances of each parameter are filled by leveraging the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) algorithm. Specifically, Fig. 2 , Fig. 3 , and Fig. 4 present the serving Reference Signal Received Power, Reference Signal Received Quality and Reference Signal Strength Indicator, respectively. Also, Fig. 5 , Fig. 6 , and Fig. 7 present the serving Primary Component Carrier Signal-to-Interference-and-Noise Ratio, Physical Cell Identity, and the Downlink E-UTRA Absolute Radio Frequency Channel Number, respectively. Figs. 8 to 13 are the serving 1st Secondary Component Carrier RSRP, RSRQ, RSSI, PCI, SINR and DL EARFCN. Fig. 14 and Fig. 15 present the Primary Component Carrier Physical Uplink Shared Channel Power and Physical Uplink Control Channel Power, respectively. In addition, Fig. 16 represents the Packet Data Convergence Protocol for the Downlink. Finally, Fig. 17 illustrates the Radio Link Control Throughput for the Downlink.

Experimental Design, Materials and Methods
A 4G LTE modem was used in acquiring the field data. The device, equipped with a 2 ×2 MIMO antenna with 64 Quadrature Amplitude Modulation (QAM) capability, is mounted on a vehicle driven at an approximately constant speed of 30 km/h. The data are logged at the 1 s interval and time-stamped. Thirty-two parameters, including the logging time, longitude and latitude, were recorded every second. Data were logged for a total number of forty-two thousand, four hundred and ninety-eight instances. In particular, this corresponds to 11 h, 48 min and 18 s measurement duration of logging data every second. However, some cases were not logged for each parameter due to severe path losses caused by huge separation distances between transmitters and the receivers and other obstructions in the line of sight. We use MATLAB R2020a installed on a personal computer for data curation and analysis. In particular, the data were analyzed with the exemption of the missing values. After that, the missing values were estimated using the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) algorithm.

Ethics Statements
The authors declare that they have read and followed the ethical requirements for publication in Data in Brief.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.