Energy-Efficient AP Selection Using Intelligent Access Point System to Increase the Lifespan of IoT Devices

With the emergence of various Internet of Things (IoT) technologies, energy-saving schemes for IoT devices have been rapidly developed. To enhance the energy efficiency of IoT devices in crowded environments with multiple overlapping cells, the selection of access points (APs) for IoT devices should consider energy conservation by reducing unnecessary packet transmission activities caused by collisions. Therefore, in this paper, we present a novel energy-efficient AP selection scheme using reinforcement learning to address the problem of unbalanced load that arises from biased AP connections. Our proposed method utilizes the Energy and Latency Reinforcement Learning (EL-RL) model for energy-efficient AP selection that takes into account the average energy consumption and the average latency of IoT devices. In the EL-RL model, we analyze the collision probability in Wi-Fi networks to reduce the number of retransmissions that induces more energy consumption and higher latency. According to the simulation, the proposed method achieves a maximum improvement of 53% in energy efficiency, 50% in uplink latency, and a 2.1-times longer expected lifespan of IoT devices compared to the conventional AP selection scheme.


Introduction
The Internet of Things (IoT) is transforming our lives and workplaces, presenting unparalleled opportunities to improve efficiency, reduce costs, enhance safety, and drive innovation across a broad range of industries and applications. From smart homes and cities to healthcare, transportation, and industrial automation, IoT is reshaping how we engage with the world. One promising application of IoT technology is the use of unmanned aerial vehicles (UAVs) for effective data collection, enabling real-time monitoring and analysis in various contexts [1,2]. In particular, the healthcare industry is poised to experience significant economic growth worldwide by 2025, with estimates projecting annual growth between USD 1.1 trillion and 2.5 trillion due to the adoption and integration of IoT technology [3].
As the traffic on a Wi-Fi network increases, the cells covered by the network's access point (AP) become smaller and more crowded. As a result, mobile terminals (MTs), including IoT devices, are present within multiple overlapping cells in Wi-Fi networks [4]. In this scenario, MTs typically connect to the AP with the strongest signal, which can result in contention and packet collisions during transmission due to the concentration of the devices on particular APs. Consequently, these repetitive transmissions can disrupt energy efficiency and increase latency at the device. Additionally, non-crowded APs are underutilized, which leads to lower overall network performance. Therefore, it is important to address the issue of selecting the optimal AP that considers IoT devices' energy efficiency and latency in a multi-coverage Wi-Fi network environment.
There are two types of AP selection schemes in Wi-Fi networks: distributed and centralized. In a traditional distributed scheme, an MT selects an AP based on the received signal strength indication (RSSI) values between the MT and several available APs [5]. However, biased AP connections can occur when many MTs want to connect to a particular AP, which leads to load imbalance and poor quality of service (QoS) for MTs, including low throughput and latency performance [6]. Some studies have attempted to solve this problem by using the combination of RSSI values and other parameters [7,8], but distributed AP selection methods have limitations in addressing load balancing due to the limited information that MTs can obtain [9][10][11].
To address these issues, centralized AP selection methods have been proposed [12][13][14]. The centralized approach for AP selection involves the AP choosing the most suitable AP based on factors such as RSSI value and achievable throughput. This method can help to reduce the problem of unbalanced load and enhance network performance. However, this approach does not take into account the uplink traffic and the energy efficiency of IoT devices. When aiming to provide IoT services, it is crucial to consider the uplink traffic and the energy efficiency of IoT devices because the performance (e.g., reliability, durability, etc.) highly depends on the transmission activity of the IoT devices. For example, in healthcare IoT services, uplink traffic, including sensed IoT data, is frequently transmitted to the server, and the amount of uplink traffic is much more significant than that of downlink traffic. Therefore, rather than considering downlink traffic, the consideration of uplink traffic is more important. In addition, the frequent replacements of IoT devices due to the limited battery capacity is the most significant challenge for implementing good quality IoT service.
To solve the problems mentioned above, in [15] (our previous study on the iAP system), we proposed the iAP system that increases the energy efficiency of IoT devices when transmitting uplink IoT data after the AP connection procedure. However, we have recognized that the procedures for the initial AP selection and connection establishment also cause a large energy consumption of IoT devices, especially in crowded network environments. Such real-time connection dynamics between MTs and APs occur without the knowledge of future connections. The selection of an AP has a significant impact on network performance, specifically in terms energy efficiency, as it is influenced by factors such as the uplink traffic of APs and the distance between APs and their connected MTs. However, relying solely on the received signal strength indicator (RSSI) between the MT and AP is inadequate for achieving optimal connections. Moreover, the number of possible cases for connections between MTs and APs grows exponentially with the number of MTs, resulting in a large search space. To effectively explore this space while considering the influence of the current AP selection on future network performance, the adoption of a reinforcement learning algorithm is essential. Therefore, in this paper, we propose an energy-efficient AP connection method using an intelligent AP (iAP) system [15] to increase the lifespan of IoT devices; particularly, we focus on an AP selection and connection method before transmitting uplink IoT data to achieve much better energy efficiency for IoT devices.
The main contributions of this paper are as follows: • This paper proposes a novel energy-efficient AP selection scheme to increase the lifespan of IoT devices. To achieve this, we design an AP control system architecture that selects the optimal AP and controls operating parameters. • We propose a new Energy and Latency Reinforcement Learning (EL-RL) model for optimal AP selection. The EL-RL model utilizes RSSI values and the number of connected IoT devices as input sequences for the AI model, with the aim of addressing the load-unbalancing problem and enhancing the energy efficiency of IoT devices.
To the best of our knowledge, this represents the first attempt to consider the real-time connection dynamics and energy efficiency of IoT devices in the context of optimal AP selection.
• Based on the newly defined collision probability considering the retransmission of IoT devices, we design the energy consumption and latency estimation model of the overall IoT devices in Wi-Fi networks. • We also analyzed the energy consumption and latency of IoT devices using a proposed energy-efficient AP selection scheme with an EL-RL model. Through extensive simulations, the proposed scheme achieved significant improvements, including a maximum of 53% in energy efficiency, 50% in uplink latency, and a 2.1-times improvement in the expected lifespan of IoT devices, compared to legacy AP selection schemes.

Related Works
Enhancing the energy efficiency of IoT devices is of paramount importance as it enables the provision of a diverse range of IoT services while simultaneously minimizing energy consumption. Significant research efforts have been devoted to this area, as evidenced by notable studies [16][17][18][19][20]. These works have made significant contributions to the understanding and development of energy-efficient solutions for IoT devices, offering valuable insights and strategies for improving their performance in terms of energy consumption and sustainability.
When multiple access points are overlapped, the selection of an appropriate AP becomes a critical concern. An energy-efficient AP selection method is required to address this challenge and enhance the energy efficiency and QoS for IoT devices in IoT services. As a result, numerous studies have focused on investigating AP selection schemes in both decentralized and centralized approaches [5][6][7][8][9][10][11][12][13][14]. These research endeavors aim to provide effective solutions for optimizing AP selection and improving the overall throughput and QoS of IoT devices in diverse IoT services (Table 1). In legacy distributed AP selection schemes, the MT selects the AP with the strongest signal [5,9], which causes an unbalanced load across the network. In [9], the authors used an RSSI interval overlap degree determination method to improve positioning accuracy, but it did not address the load-unbalancing problem. Other AP selection schemes that utilize RSSI value and achievable throughput parameters also have limitations in AP load balancing and network utilization [7,11]. While in [7] the authors used a multi-armed bandits algorithm to enhance downlink throughput, it did not consider uplink traffic. In [11], the authors increased downlink throughput using RSSI value and achievable throughput, but the authors did not consider uplink traffic and energy consumption of MTs. Even centralized AP selection approaches primarily focus on downlink throughput [12,13], without considering uplink performance and collision probability. In [12], the authors used RSSI value and achievable throughput to select the optimal AP using a centralized approach, but the authors did not consider uplink traffic and energy efficiency. In [13], the authors used estimated RSSI values, which are obtained by a long short-term memory (LSTM) algorithm to improve positioning accuracy while reducing computational load and enhancing noise robustness, but the authors did not consider uplink traffic and energy efficiency. In general, AP selection studies have mainly emphasized increasing downlink performance rather than considering the uplink traffic and energy efficiency of IoT devices.
For more robust and durable IoT services, new AP selection proposals are necessary because IoT devices, which are the main component of the service, are sensitive to energy and uplink delay [15]. Therefore, a new energy-efficient AP selection scheme is required to overcome the problem of biased connection to a particular AP, which increases the collision probability of the network. Particularly, the biased connection can result in an increased amount of retransmissions at IoT devices, leading to higher energy consumption and uplink latency. Therefore, in this paper, we propose a new method that considers such problems to improve the performance of Wi-Fi networks.

System Description
In this section, we introduce a novel intelligent access point (iAP) control system for energy-efficient AP selection in uplink environments for IoT services. The proposed iAP control system is an advanced centralized AP selection scheme that considers the energy consumption and latency of IoT devices. The legacy AP selection scheme chooses the closest AP based on the highest RSSI value, which is not the best AP selection for the energy efficiency of IoT devices, as it causes the retransmission problem due to load unbalancing. In contrast, the proposed iAP control system selects the optimal AP by using not only the RSSI values of the IoT device but also the number of IoT devices in the AP coverage as input sequences for reinforcement learning. Additionally, the proposed iAP control system addresses the collision issue based on the formulation of collision probability considering the uplink transmissions of IoT devices aiming to minimize the number of collisions in the network. The proposed iAP control system solves the load-unbalancing problem and improves energy efficiency and uplink latency of IoT devices, as demonstrated by Figure 1, which shows an example of AP selection of IoT devices under overlapping APs. For example, from an IoT device perspective, the device can achieve more energy efficiency gain by balancing the energy consumption for uplink and retransmission. In other words, the IoT device may spend a little more energy to connect the sparse AP (AP 2 ) located far from the device, but it can significantly reduce retransmission energy consumption by avoiding collisions to connect the dense AP (AP 1 ) located closer from the device.

Architecture of iAP Control System
An overview of the proposed iAP control system is depicted in Figure 2. The softwarebased iAP controller is designed to facilitate energy-efficient AP selection for IoT devices. The iAP controller comprises the proposed Energy and Latency Reinforcement Learning (EL-RL) model, which is a reinforcement learning model that considers energy and latency factors to achieve optimal AP selection, as well as a transmission (Tx) power model and a location estimation model for better estimation and recommendation. The iAP controller interacts with the iAPs via an application programming interface (API) to ensure energy-efficient AP selection and load balancing. The selected iAP is responsible for managing the operational parameters of IoT devices to improve their energy efficiency. The process of the proposed iAP control system for performing energy-efficient AP selection and deciding Tx power of IoT devices is illustrated in Figure 3. To begin, an IoT device initiates the process by transmitting a "probe request" message to iAPs. Upon receipt of the probe request message, the iAPs forward the message to the iAP controller along with the received RSSI value. Additionally, the iAPs periodically send local information, such as the number of connected IoT devices, to the iAP controller. The iAP controller utilizes global information, updated with the local information from the iAPs and various learning models, to select the optimal iAP and recommends the Tx power value for the IoT device. The selected iAP is then instructed to respond to the probe request with information on the recommended optimal transmitting power value of the IoT device. Upon receiving the probe response message from the selected optimal iAP, the IoT device establishes a connection with the optimal iAP and transmits IoT data with the recommended Tx power. The iAP controller employs an EL-RL model for optimal AP selection, which takes into account both energy and latency factors, to determine the energy-efficient AP selection. Additionally, the iAP controller employs a location estimation model to estimate the location of the IoT device and calculates the optimal transmit power value of the IoT device based on the estimated location.

Procedure of iAP Control System
The procedure of the iAP controller is presented in Figure 4. The summary of key symbol definitions is presented in Table 2 for reference. The iAP controller updates the global network status information that includes the number of IoT devices (N IoT ) connected to each iAP (N AP ) and the signal strength (RSSI i,j ) between the IoT device i and iAP j. Using the received RSSI i,j information from several iAPs, the iAP controller calculates the candidate iAP set C i based on the RSSI of IoT device i and the global network status. Then, the iAP controller employs the revised ideal CSMA (Carrier-Sense Multiple Access) network model to compute the energy consumption (E i,j ) and latency (L i,j ) of IoT device i in set C i . Subsequently, the iAP controller trains the EL-RL model to optimize the objective function based on the average energy consumption and the average latency of IoT device i in the candidate iAP set C i . Using the model, the iAP controller selects the iAP with the highest expected reward (considering the average energy consumption and average latency) for IoT device i. Moreover, the iAP controller determines the recommended transmitting power of IoT device i based on the fingerprinting map and assigns an iAP for the IoT device i, following which the iAP controller sends the control message to the selected iAP. Subsequently, the selected iAP transmits a "probe response" message to the IoT device i, which contains the recommended Tx power value. Upon receiving the "probe response" message, the IoT device i performs a connection handshake and transmits uplink data to the selected iAP with the recommended Tx power. The functional architecture of the iAP system, consisting of IoT device, iAP, and iAP controller, is presented in Figure 5. Upon achieving the optimal iAP connectivity, the IoT devices wirelessly transmit IoT data to the iAP via the MQTT (Message Queuing Telemetry Transport) application layer using the TCP (Transmission Control Protocol) transmission method to ensure the protection and reliability of the data [15]. The iAP receives and stores the IoT data in its local cache before forwarding the data to the iAP controller, which is located on a cloud server and responsible for storing and analyzing the data in a database. Using the analyzed data, the iAP controller trains various AI models that are subsequently deployed to the iAP. The device energy management module in the iAP manages the energy consumption of the IoT devices by sending control messages to the IoT devices, which contain operating variable values. The IoT devices, in turn, adjust their data transmission period, DTIM (Delivery of Traffic Indication Map) value, Tx power, and other parameters based on the received control message, thus improving their energy efficiency.
Send the local network status information (number of connected IoT devices and RSSI of IoT devices) to the iAP controller with periodic T.
Update the global network status information, which contains the number of IoT devices of iAP j and the signal strength RSSI i,j between IoT i and iAP j .
Receive the "probe-request" message and send the information (RSSI i ) of the IoT device to the iAP controller.
Receive the RSSI i,j information of the IoT device from several iAPs (iAP j ), and get the candidate iAP set C i based on the RSSI i,j of IoT device and global network status.
Calculate the avg. energy ( , ) and the avg. latency ( , ) of the IoT device in the candidate iAP set C i based on the proposed model.

Train the Energy and Latency reinforcement learning (EL-RL) model
to minimize the objective function ( ) using the the avg. energy ( , ) and the avg. latency ( , ) of the IoT device .
Select the iAP with the best reward (considering avg. energy and avg. latency) of the IoT device .
Calculate the recommended Tx power of the IoT device based on the location estimation model, and assign the iAP for the IoT device and send the control message to the selected iAP.
The selected iAP sends "proberesponse" message to the IoT device which contains the recommended Tx power of the IoT device .
Receive the "probe-response" message and proceed to perform a connection handshake, and transmit a uplink data to the selected AP with the recommended Tx power.

Energy and Latency Reinforcement Learning (EL-RL) Model
The proposed Energy and Latency Reinforcement Learning (EL-RL) model is illustrated in Figure 6. The model is designed for iAP selection, where the environment sends state information in the form of s t to the EL-RL agent. The state s t is determined based on the RSSI between the IoT device and the candidate iAP, as well as the number of MTs currently connected to the iAP. At this stage, action a t represents the candidate iAP to connect the IoT device. The numerical solver then computes the reward r t , taking into account the number of connected IoT devices and their distances from the chosen iAP. Additionally, the reward is calculated based on the average energy consumption and latency of IoT devices. Thus, the EL-RL model aims to minimize the average energy consumption and latency of all connected IoT devices, which is set as the objective function. The EL-RL agent receives the reward r t and selects a new action, and this process continues iteratively until the agent obtains the maximum reward through reinforcement learning. The notations used in the EL-RL model are defined as follows: where subscript cj is the number of candidate iAPs for connecting the IoT device i among all iAPs.
where α is the weight for avg. energy consumption and β is the weight for avg. latency. In addition, the proposed iAP control system includes a location estimation ML model. This model employs a fingerprint method, which estimates location based on RSSI values by comparing them with reference point values stored in the database. The fingerprint method is widely recognized as the most suitable method for indoor positioning [21,22]. Once the location is estimated, the distances to each candidate iAP are calculated, and the recommended Tx power values are determined according to the adaptive Tx power equation (Equation (A1)) in the Appendix A [15]. The iAP controller selects the optimal AP based on the EL-RL model and sends the recommended transmitting power to the IoT device. The iAP controller then updates the localization ML model and EL-RL model.
In the training procedure of the EL-RL model, each training data instance is obtained whenever a new connection is established between a MT and an AP. Each training data instance consists of the state s t , which includes information such as the RSSI between the MT and AP and the number of already connected MTs for each AP. Additionally, it contains the action a t representing the selected AP for the connection and the reward r t associated with the chosen action in terms of network performance, such as latency and energy efficiency. To facilitate the training process, the training data is constantly stored in the iAP controller's storage as new connections are made. From this dataset, a batch of training data is randomly selected for training the EL-RL model. This random selection helps ensure a diverse and representative sample of the training instances. To further enhance the learning process, the reward for each action in the selected training data is adjusted using the Proximal Policy Optimization (PPO) algorithm [23]. By adjusting the rewards, the model can better estimate the impact of each action on future network performance. During each epoch of training, the model's parameters are iteratively updated using randomly chosen training data. This iterative process allows the model to gradually improve its performance and adapt to various network conditions. The training continues for several epochs until the total reward converges, indicating that the model has learned an optimal policy for AP selection.

Collision Probability
The energy consumption caused by traffic retransmissions resulting from packet collisions is demonstrated in Figure 7. When an IoT device and any other IoT devices try to simultaneously transmit a packet during the first transmission attempt from the IoT device perspective, a collision occurs between the transmitted packets, and a timeout for the IoT device occurs because an ACK(Acknowledgement) packet has not been received. Once the channel becomes idle again, the IoT device attempts a second transmission using a random backoff time within the double contention window size. The same process applies to the collisions encountered during the second through sixth transmission attempts. If a collision happens even on the seventh transmission attempt, the packet is discarded, and there is no further retransmission attempt. To examine the energy consumption attributed to retransmissions, we conduct mathematical calculations of collision probabilities based on realistic collision simulations. As per the IEEE 802.11 standardization, we consider that the IoT device could transmit the same packet a total of seven times, which includes the initial transmission attempt. Hence, the maximum number of retransmission attempts (m) is six [24,25], the minimum contention window size CW min is 31 time slots, the maximum contention window size CW max is 1023 time slots, and the maximum number of recursive attempts to increase CW is 6 [24,25]. We define the collision probability for each transmission attempt as P c (n) and the transmission attempt probability as P a (n) in follows. Therefore, the transmission probability of the nth transmission attempt, P a (n) is given by Equation (1), In this paper, a collision occurs when more than one IoT devices share the same time slot for attempting uplink transmissions. For example, when one device among N devices is trying to transmit within a certain time slot, another device among N − 1 devices may try to transmit simultaneously. We take into account the concurrent transmission attempt in the following collision model. The transmission collision probability is formulated from a new perspective in Equation (2), P c (n) = ∑ a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ,a 7 a 2 ,a 3 ,a 4 ,a 5 ,a 6 ,a 7 P s (N − 1)! a 1 !a 2 !a 3 !a 4 !a 5 !a 6 !a 7 !a 0 ! P a (1) a 1 P a (2) a 2 . . . P a (7) a 7 (1 − P A ) a 0 = ∑ a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ,a 7 where P s = 1, when there is another transmission, 0, when there is no other transmission, - where N − 1 = ∑ m+1 n=0 a n where P A = ∑ m+1 n=1 P a (n) This collision probability is calculated by considering the packet collision probability within a single arbitrary time slot. Additionally, this collision probability considers the actual collision probability for ML, which can be solved numerically. Concerning the transmitting devices at an arbitrary time slot, the number of devices attempting the first transmission in that time slot is represented by a 1 , and the number of devices attempting the second transmission is expressed as a 2 . Likewise, a n describes the number of devices attempting nth transmission in that time slot for n = 0, 1, 2, ..., 7. In addition, a 0 is the number of devices with no transmission attempt in the same time slot.
The collision probability is defined as the sum of the values multiplied by the number of cases in which collision can happen and the transmission attempt probability. Here, if there is no transmission from any device at that time slot, the P s has a value of 0, and it is considered that no collision has occurred. Furthermore, P A is defined as the sum of all attempted transmission probabilities. The collision probability based on these actual collisions was calculated with numerical techniques.

Energy and Latency of IoT Devices
In this subsection, we present the average energy consumption and the average latency model of IoT devices based on the collision probability. The average energy consumption of IoT devices can be obtained as the sum of the product of the probability of all transmission attempts, the probability of successful transmission without collision, and the energy consumption value according to the nth transmission attempt. The average energy consumption of IoT devices is given as Equation (3), The energy consumed by the nth transmission attempt is the sum of the product of the operation time of each operation mode and the power used in that operation mode as follows in Equation (4).
The total Tx mode time according to the transmission attempt consists of data transmission time and ACK transmission time in Equation (5). The data transmission time can be obtained by multiplying the number of transmission attempts by the time required to send one transmission data, and the ACK transmission time can be obtained by the time required to send an L2ACK message once.
where N data is a 104 bytes, N L2ack is a 54 bytes, B is a 160 kHz, and γ is a 40 dB [15]. The total Rx (Receive) mode time according to the transmission attempt is given by where T ACKtimeout is a 337 µs, T ACKtime is a 44 µs [12], and T beacons is a I period n dtim ·I beacon · t beacon µs [15]. The time calculated in Rx mode is the sum of the ACKtime time value, the beacon reception time value, and the product of the number of times sent so far and the time set by ACKtimeout.
The total sleep mode time according to the transmission attempt is given by where I period is a 1 s of the transmission period. The total sleep mode time per transmission attempt can be obtained by subtracting the Tx mode time and the Rx mode time from the period. In addition, the adaptive Tx power according to the distance, P adaptive tx , can be obtained as Equation (A1) in the Appendix A [15]. The average uplink latency of IoT devices is calculated by the below Equation (8). The average latency is composed of the average backoff time, the average transmission time for successful delivery, and the average collision time for transmission failure according to the nth transmission attempts, The average backoff time of nth transmission attempts is given by where timeslot is a 20 µs, and W initial is a 16 as a default value [12]. The average transmission time for the successful delivery of nth transmission attempts is given by T a (n) = (n − 1)T c (n) + N data Blog 2 (1 + γ) where SIFS is a 10 µs, T ACKtime is a 44 µs, and DIFS is a 50 µs [12]. The average collision time for transmission failure of nth transmission attempts is given by where T ACKtimeout is a 337 µs, and DIFS is a 50 µs [12].
To calculate the average consumed energy of an IoT device, we use the RSSI values and the number of IoT devices that are connected to the iAPs. Moreover, to calculate the average latency of an IoT device, we use the number of IoT devices that are connected with iAPs for load balancing. The objective function of the proposed EL-RL model has defined below in Equation (12), where α and β are the weight of average energy consumption and average latency, respectively. The goal of the objective function is to minimize the weighted sum of the average energy value and average latency value.

Performance Evaluation
The simulator uses Python and the PyTorch library for the PPO algorithm implementation [26]. The parameter settings for simulation are shown in Table 3. For the simulation, we assume the total number of APs is three, the distance between the APs is 20 m, and the cell coverage is 15 m. In addition, it is assumed that the IoT devices in each APs are normally distributed with respect to the iAP location, which is placed at the center of the cell. The distribution ratio of IoT between APs are assumed to be [1:1:1], [1:9:9], and [1:10:3]. These represent hotspot scenarios: balanced scenario, two hotspots scenario, and one hotspot scenario, respectively. The total numbers of IoT devices applied to the simulation are 50, 100, 150, and 200. Reinforcement learning of the EL-RL model is performed based on the PPO algorithm, which shows the best performance and fastest learning in various fields [27] (Figure 8). The reason for using the PPO algorithm is as follows. First, it is rare for a sequence to produce a similar state because the state in a sequence is defined by the distance between the IoT device and the AP and the number of devices connected to the AP. Second, in order to train EL-RL model from numerous amount of various sequences, we must carefully consider the effect of current actions on future actions, i.e., the final return value. Therefore, we implement an advantage actor-critic-based PPO algorithm as a value-based algorithm that can efficiently consider the return value for the current action.
The agent of the proposed EL-RL model is based on the PPO algorithm. The state, action, reward weight values, and epoch of the EL-RL model for the simulation are as follows. We compare three AP selection models for performance evaluation. First, the legacy AP selection model that only uses RSSI value to select AP is presented as 'legacy AP with RSSI'. Second, the proposed iAP selection model that only uses RSSI value to select iAP with an adaptive Tx power is expressed as 'proposed iAP with RSSI'. Last, the proposed iAP selection model that uses the EL-RL agent to select iAP is denoted as 'proposed iAP with EL-RL'. With three AP selection models, we consider three cases regarding distribution ratios of IoT devices between APs as follows.
The results for each model in all experiments are the average value obtained from 500 simulations. Figure 9 presents the average energy consumption of IoT devices according to the distribution ratio between APs. In all cases, the average energy consumption of IoT devices shows an increasing trend as the number of devices increases. For Case 1, the energy consumption performance of the two proposed iAP models (namely 'proposed iAP with RSSI' and 'proposed iAP with EL-RL') are better than that of the 'legacy AP with RSSI' model, but the energy consumption values of the two models are comparable as shown in Figure 9a. Since Case 1 is already load-balanced, it shows similar performance between the two proposed models. However, the two proposed iAP models demonstrate lower energy consumption, at 63∼66%, compared to the legacy AP model because of the adaptive Tx power and the prompt ACK reception function in the iAP system. In Case 2 and Case 3, as shown in Figure 9b,c, respectively, the average energy consumption of IoT devices increases with the increasing number of IoT devices, a similar trend to Case 1. In Case 2, where there are two hotspot APs, the two proposed iAP models exhibit energy consumption performance ranging from 62% to 65% compared to the legacy AP model. On the other hand, in Case 3, where there is only one hotspot AP, the two proposed iAP models demonstrate better performance in terms of energy consumption ranging from 47% to 64% compared to the legacy AP model. Especially the 'proposed iAP with EL-RL' model performs the best in Case 3, exhibiting energy consumption of only 47.1% compared to the 'legacy AP with RSSI' model, with a total of 100 IoT devices. This is because the 'proposed iAP with EL-RL' model has a better load-balancing effect that reduces retransmission energy.  Figure 10 displays the average energy consumption of IoT devices for cases with respect to the different numbers of IoT devices. The results indicate that the two proposed iAP models outperform the legacy AP model in terms of energy consumption performance. Particularly, the 'proposed iAP with EL-RL' model demonstrates the best energy consumption performance, achieving an energy reduction of 47.1% in the 1:10:3 distribution of 100 IoT devices. This outcome is due to the 'proposed iAP with EL-RL' model's load-balancing scheme, which selects the optimal AP while taking into account both energy consumption and latency. Figure 11 presents the average uplink latency of IoT devices according to the distribution ratio between APs. Figure 11a shows the average uplink latency of IoT devices for Case 1. The average uplink latency of each model increases as the number of IoT devices increases due to retransmissions resulting from packet collisions. However, the two proposed iAP models exhibit almost the same average uplink latency as the legacy AP model since the APs are already load-balanced. Figure 11b,c shows the average uplink latency of IoT devices for Case 2 and Case 3, respectively. In Case 2, where there are two hotspot APs, the 'proposed iAP with EL-RL' model demonstrates a latency ranging from 71% to 94% compared to the legacy AP model. This is because only the 'proposed iAP with EL-RL' model selects the AP, taking into account the latency of IoT devices. Furthermore, in Case 3, where there is only one hotspot AP, the 'proposed iAP with EL-RL' model exhibits better performance, with a latency ranging from 50% to 82% compared to the legacy AP model. From this, we can see that the 'proposed iAP with EL-RL' model shows better latency performance as the unbalanced load situation worsens. The average uplink latency of IoT devices under each case with the different number of IoT devices is depicted in Figure 12. The results indicate that in Cases 2 and 3 where load balancing is required, the 'proposed iAP with EL-RL' model is superior in terms of latency performance to both the 'legacy AP with RSSI' model and the 'proposed iAP with the RSSI' model. This is because the EL-RL model minimizes the number of retransmissions through load balancing. Specifically, the 'proposed iAP with EL-RL' model demonstrates the best latency performance, achieving a latency reduction of 50.5% in the 1:10:3 distribution ratio of 100 IoT devices. This outcome is due to the 'proposed iAP with EL-RL' model's loadbalancing scheme, which chooses the optimal AP considering the latency of IoT devices.  Figure 13 presents the expected lifespan of an IoT device, under the different distribution ratios between APs. In Case 2 of the 1:9:9 distribution ratio between APs, the expected lifespan of an IoT device is shown in Figure 13a. The 'proposed iAP with EL-RL' model can significantly enhance the expected lifespan, with an improvement ranging from 1.6 times to 1.9 times roughly when compared to the 'legacy AP with RSSI' model. Furthermore, Figure 13b displays the expected lifespan of an IoT device according to Case 3 of the 1:10:3 distribution ratio between APs. The 'proposed iAP with EL-RL' model offers an even more significant improvement in the expected lifespan, roughly ranging from 1.7 times to 2.1 times when compared to the legacy AP model. From this, it can be seen that the 'proposed iAP with EL-RL' model shows better energy-saving performance as the unbalanced load situation deepens. As such, the increased expected lifespan of IoT devices using the 'proposed iAP with EL-RL' model can be of great help in providing various IoT services by solving the problem of frequent battery replacement.
The generalization of IoT device location (i.e., the location of each IoT device has continuously changed as epoch increased) in the EL-RL model is demonstrated in Figure 14 under three cases, each with a total of 100 devices and different distribution ratios between APs. In Case 1 where the distribution ratio is 1:1:1, Figure 14a displays the location generalization. Case 2 with a distribution ratio of 1:9:9 is presented in Figure 14b. Finally, Figure 14c illustrates Case 3 where the distribution ratio is 1:10:3. As the epoch progresses, the IoT devices located at the overlapping section tended to select the AP connected with a smaller number of IoT devices to maintain stable load balancing in terms of energy efficiency and latency. Therefore, regardless of the distribution of IoT devices, the proposed EL-RL model can be stably trained under the generalization of IoT device location, and improve the energy efficiency and latency performance of IoT devices. The convergence analysis of the EL-RL model is demonstrated in Figure 15 under three different distribution ratios between APs, in order to examine its performance in various scenarios. Figure 15a depicts the convergence behavior of the model when the distribution ratio is 1:1:1. The reward of the EL-RL model quickly and efficiently converges after approximately 25 epochs of training, while the energy consumption and latency also gradually converge as the epochs progress. Similarly, the convergence behavior of the EL-RL model is presented in Figure 15b when the distribution ratio is 1:9:9. As observed in the previous case, the reward, energy consumption, and latency of the EL-RL model converge efficiently after approximately 25 epochs of training. Finally, Figure 15c illustrates the convergence behavior of the EL-RL model when the distribution ratio is 1:10:3. The reward of the model gradually converges as the epochs progress, indicating that the reinforcement learning was successful. Although the reward changes rapidly in some cases, the range of change decreases as the learning progresses and ultimately converges. Additionally, the energy consumption and latency of the EL-RL model also converge as the epochs progress. To address the training and inference time of the EL-RL model, we provide comprehensive information in Table 4, which summarizes the average training time per epoch and the average inference time per input instance. The simulations were conducted on a computer system with a 64-bit Intel Core i7-800 CPU, and 16 GB of RAM. The simulation results reflect the duration required for training the EL-RL model and the inference time for making AP selections across different cases. It is noteworthy that the average training time per epoch increases with a higher number of MTs. Nevertheless, the overall training duration remains within 25 epochs, equivalent to less than 10 min. Furthermore, the training and inference procedures can be decoupled. The iAP operates using the most recently updated EL-RL model, which is redistributed to the iAPs when the model is updated at the iAP controller with an accumulated training dataset. This approach enables dynamic and iterative training, enhancing the model's effectiveness over time. With an average inference time below 0.5 milliseconds, the EL-RL model has a minimal impact on the overall time required for establishing connections, typically measured in seconds [28]. Therefore, the EL-RL model demonstrates its feasibility for real-world AP selection scenarios without significantly increasing connection setup delays. Finally, while the training and inference of the EL-RL model primarily utilize the CPU, incorporating GPU acceleration can further reduce processing time in both the training and inference stages.

Conclusions
In this paper, we propose an energy-efficient AP selection scheme for IoT devices that uses reinforcement learning to minimize energy consumption and latency. To achieve this goal, we develop an iAP control system for selecting the optimal AP in Wi-Fi networks. We also introduce a novel energy-efficient AP selection model, EL-RL model, which utilizes RSSI values and the number of IoT devices connected to APs to balance the load. Additionally, we design an energy and latency reinforcement learning (EL-RL) model to address the load-unbalancing problem. Furthermore, we control the adaptive Tx power of IoT devices by employing a location estimation ML model and a Tx power recommendation model. We evaluate the proposed scheme by analyzing the energy consumption, uplink latency, and collision probability in Wi-Fi networks. Our results show that the proposed scheme can achieve a maximum improvement in energy efficiency of 53%, a 50% reduction in latency, and a 2.1-times improvement in the expected lifespan of IoT devices.
In future research, it would be valuable to explore the potential limitations and extensions of our proposed scheme. One possible direction is to investigate the applicability of the EL-RL model and iAP control system for different types of IoT devices or in diverse environmental conditions, such as an industrial IoT service. Additionally, the proposed scheme could be adapted to incorporate other relevant factors, such as network congestion or device mobility, to further optimize energy efficiency and latency. By addressing these aspects, we can continue to enhance the performance and versatility of the proposed scheme, making it more robust and adaptable for various IoT scenarios.

Acknowledgments:
We would like to thank the anonymous reviewers for taking the time and effort necessary to review the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A
Equation of adaptive transmitting power according to the distance between the IoT device and the AP from the reference [15]. (A1) P I is the measured interference power at the iAP, P N is the measured noise power at the iAP, and γ * is the target SINR at the minimum bound seen by the iAP, respectively. L is the total loss factor between the IoT device and the iAP, and the loss factor can be modeled, for example, by using the distance path-loss model with a fading component, i.e., L = 1 L o d −a h [29], where L o is a constant depending on the transmission frequency and the antenna gains, d is the distance between transmitter and receiver, B is the bandwidth of the channel, a is the path-loss exponent, and h is a random variable representing the channel fading, respectively [29,30]. And µ is the conversion factor of a power amplifier from electric power to RF power, N m is a fixed message length, P e is retransmission probability, and P o is the electronic power consumption overhead incurred in the communication module to encode a message, respectively. When using the Lambert-W function, that is, W[z]e W[z] = z, we can calculate the optimal transmitting power of the IoT device [15].
The adaptive Tx power according to the distance can be obtained by Equation (A1), and the amount is displayed as a logarithmic graph as the distance increases, as shown in the below Figure A1.