Online Offline Learning for Sound-Based Indoor Localization Using Low-Cost Hardware

Online Learning algorithms and Indoor Positioning Systems are complex applications in the environment of cyber-physical systems. These distributed systems are created by networking intelligent machines and autonomous robots on the Internet of Things using embedded systems that enable the exchange of information at any time. This information is processed by Machine Learning algorithms to make decisions about current developments in production or to influence logistics processes for optimization purposes. In this article, we present and categorize the further development of the prototype of a novel Indoor Positioning System, which constantly adapts its knowledge to the conditions of its environment with the help of Online Learning. Here, we apply Online Learning algorithms in the field of sound-based indoor localization with low-cost hardware and demonstrate the improvement of the system over its predecessor and its adaptability for different applications in an experimental case study.


I. INTRODUCTION
T HIS paper presents the further development of an innovative Indoor Positioning System (IPS). IPS gain more and more importance with the increasing development of the Internet of Things (IoT). Industrial applications such as • autonomous robots in smart factory environments, • warehouse management applications for large devices or • automated repacking stations require information about positions of participating components in a life-long learning [1] [2] application to enable AI resource-conserving optimization. The quality of the position information such as accuracy, real-time capability and efficiency is directly related to the possibility of saving resources. The necessary costs for purchasing and operating the required devices influence the cost-effectiveness of the applications and thus their acceptance by industrial companies. In [3], accuracy and costs of the most popular IPS are compared. The paper concludes that there is still no satisfactory overall solution because accurate solutions are expensive or cannot localize in real time while cheap IPS are inaccurate. Furthermore, most recent technologies are based on triangulation and therefore require multiple, often expensive, devices. In addition, centimeter-scale technologies are either expensive or prone to failure. In this paper, we present an alternative technology that promises to yield outstanding results with cost-effective hardware aiming at replacing existing technologies in the long run, which often require high maintenance and acquisition costs while at the same time offering medium quality services. Regarding the required hardware, we only use a single $2 electronic microphone, a single ordinary loudspeaker and the WiFi-capable ESP8266 based IoT-Kit Octopus [4] and improve localization performance by combining efficient Machine Learning (ML) algorithms like K-Nearest-Neighbor (KNN), K-Means and Learning Vector Quantization (LVQ) in a distributed cloud FIGURE 1: Schematic illustration of the room impulse response, the direct sound (red) reaches the microphones earlier with a higher amplitude than the first reflections (green and blue) due to signal propagation theory [4]. application. We apply these algorithms to room impulse response (RIR) records, which depend on the position of the microphone, the room geometry and the setup of the room. If the environmental conditions remain constant, the suitability of the RIR for localization becomes apparent. Figure 1 schematically shows the relationship between the RIR and the position of the recording device. As the microphone moves away from the loudspeaker, the time required for the signal to reach the microphone increases and the measured intensity decreases. The Direct Sound (red impulse) reaches the microphone first, the First Reflections (green and blue impulse) are received later and weaker due to the larger distance and the energy loss due to acoustic reflection. The locations of microphones 1 and 2 can be distinguished based on the different RIR s 1 (t) and s 2 (t). Figure 2 shows two RIRs for the same position, while the right RIR is affected by a major room change. The plots show 2000 measured values recorded over a period of 0.2 seconds at a sampling rate of 10 kHz. Based on this information, a rule-based algorithm could be used to locate the microphone, but it would be highly susceptible to interference if changes were made in the room. Therefore, we investigated the use of algorithms of Unsupervised Learning (UL) and Supervised Learning (SL) that use Online Learning (OL) at runtime to adapt to changes in the localization environment.

A. RELATED WORK
We already demonstrated the applicability for static room setups [4] in 2018, where we extend a nature-inspired method [5] without need for high spatial accuracy or big microphone arrays which is inspired by [6] and [7] and uses K-Means generated representative prototypes in a KNN model. With this model, an 88 % success rate can be achieved in distinguishing 16 squares with a side length of 15 cm on a 60 cm × 60 cm table top. Since the classification relies on the RIR, which depends not only on the position of the microphone but also on the room geometry and the objects in the room, this method only works in very similar room configurations thus only being conditionally suited for real applications.
In order to counter these problems, a learning algorithm is used to adapt to room changes at runtime. Therefore, a combination of batch learning and OL [8] based on LVQ is used to address the problem of the stability plasticity dilemma [9] [10], which describes the problem of a learning system to preserve acquired knowledge while at the same time new knowledge is being built up. In this offline online learning architecture (OOLA), basic knowledge is acquired to achieve stability in a batch learning process before runtime (offline). In the case of sound-based indoor localization (SBI), this is achieved by initially learning the RIR from different positions in space using fingerprinting procedures. In order to realize the plasticity at runtime, the algorithm learns step by step, which is realized by the use of LVQ (online). The combination of the procedure is performed by a dynamic selection strategy, which uses information from both knowledge bases for a decision.

B. CONTRIBUTION
We extend the functionality of our IPS with low-cost hardware for more complex environments and improve the quality of positioning in static environments. Therefore, we adapt the described OOLA to the problem of our RIR based IPS and use LVQ to improve our initial prototypes depicting the classification model for static room setups. Further, we classify our system among other IPS. FIGURE 3: Experimental setup, the twelve blue squares depict the locating range, the book represents the obstacle which changes the RIR of the room. The measurements are carried out along the cable duct (x-direction) centimetre by centimetre row by row (y-direction).

C. STRUCTURE
Section II gives an overview of existing IPS and classifies ours regarding the given criteria. It gives a short overview of the applied algorithms and a simulation of their behaviour. Section III describes the components of the learning architecture and reviews the functionality by a case study of our IPS. Section IV summarizes the results and gives an outlook on further developments.

D. NOTATION
Vectors are indicated by bold letters x, matrices by bold capital letters M, transpositions are indicated by x T or M T . Sets are marked with calligraphic font, e.g. S, their cardinality is expressed by |S|.

II. IPS & SIMULATION
In this section, we give a short overview about existing IPS and present our method, further we describe simulations of the used algorithms.

A. IPS
The survey [3] compares existing IPS in 2017. They evaluate current approaches based on audible sound with an accuracy in the range of meters as inaccurate. First positioning experiments by evaluation of the RIR provide promising results in the decimeter range [4]. Other presented methods allow more precise positioning, but require a high amount of resources in the form of high-quality hardware or complex algorithms. The required routers for WiFi-based methods are usually available anyway, but must still be included in the resource calculation. Our approach only requires a few inexpensive devices and computers of average performance: • a MAX4466 microphone to be located, • an ordinary speaker giving the signal, • a usual ESP8266 based controller board, • and a common i5-laptop.
The survey [3] proposes a classification of the IPS according to the following three main criteria: 1) signal carrier -distinction between radio, light, sound or magnetic waves 2) signal processing -distinction between active and passive systems 3) signal structure -distinction between signals with and without embedded information. If the mobile device receives a signal and the signal is processed by peripheral devices, this is referred to as active signal processing. Passive systems are characterized by mobile devices that process a received signal to determine their own position. Since the SBI uses the RIR, the mobile device receives a signal sent by the infrastructure, which is a criteria for passive signal processing. However, since this signal is evaluated after a transmission in the periphery, the procedure is classified as active signal processing. Signals with embedded information contain coded information, which can be used to estimate the location, e.g., device codes in light bulbs or time stamps in GPS [3]. Accordingly, the present methodology can be categorized as 1) a sound based locating method with 2) active signal processing 3) without embedded information. The technique used in this work can be described as a combination of fingerprinting and evaluation of the signal propagation behaviour. Fingerprinting is already used in applications with WiFi, Bluetooth or magnetic waves. At first, a map of signal recordings (fingerprints) is created. This map allows conclusions to be drawn about the position by comparing stored data with the data to be localized [11]. Frequently Kalman or particle filters are used to filter out unwanted interference through the suppression of superimposition and noise [12]. Signal propagation deals with the behaviour of signals during their propagation in space. The strength of the signal decreases with distance, walls and obstacles lead to reflections and interference. The IPS discussed in this article uses this signal behaviour to distinguish different positions by means of automatic learning procedures. To the best of our knowledge, there is no similar methodology for indoor positioning. The listed sound-based methods mainly work via direct evaluation of signal propagation, mostly using multilateral methods that rely on signal propagation durations (e.g., time of flight) for position determination. A passive method [13] based on fingerprinting of audio recordings compares background noise from rooms and is not comparable in its accuracy. It is stated in [3] that no satisfactory solution has yet been found in the area of interior location. Precise technologies are too expensive or do not provide results in real time.

B. LOCALIZATION SCENARIO
The core of the experiment is the localization of a microphone on a table using ML methods. The data required for this is generated by a loudspeaker repeatedly transmitting the VOLUME 4, 2016 same signal, which is recorded by a microphone at different positions on the test table. Figure 3 shows the table used  with a 90 cm × 60 cm table top. The loudspeaker is fixed  on the right side of the table and transmits its signal over  the locating range in the direction of a reflecting wall to the  left. To avoid possible ambiguity due to room symmetry, the loudspeaker is positioned at an angle of approximately 10 • to the reflecting wall. The locating range corresponds to a 20 cm × 60 cm area 30 cm in front of the loudspeaker, which is divided into twelve squares with an edge length of 10 cm. The experiment is carried out in an ordinary office of 6 m × 3 m × 2.5 m size, in which furnishings for two persons are placed. The aim of the SL algorithm is to assign the distinctible recordings of the microphone to one of the twelve areas in order to estimate the position of the microphone. To do this, the MAX4466 microphone is placed centimetre by centimetre in each of these twelve fields to produce a labeled comparison recording (fingerprinting). To ensure uniform alignment across all recordings and to improve positioning accuracy, the microphone is mounted on the sliding cover of a cable duct. This cable duct can be moved parallel to the wall via two angles. Both sides of the table and the cable duct cover were calibrated for centimetre accurate positioning of the microphone. The recording of the microphone positions serves the subsequent visualization and evaluation of the results and the generation of the labels within the framework of SL. During test execution, the microphone cable is mounted inside the cable duct with the IoT-Kit Octopus attached to the underside of the table in order to avoid possible interference of the RIR by obstacles. The centimeterwise distribution of the RIR recording over the locating range results in the amount of l pos = 1200 different positions (100 per 10 cm × 10 cm square, see Figure  3). At each position, three recordings are taken to obtain sufficient training and test data. Due to physical conditions, hardware characteristics of the devices used, and noise, the three recordings at the same position are not identical, but have a similar profile. We use the first instance of each position as training data. In order to simplify processing and guarantee or improve accuracy, all recordings were taken in rows along the 60 cm long x-direction (along the cable duct) row by row along the 20 cm y-direction depicted in Figure 3. This leads to 180 recordings each row, in the order depicted in Table 1.
This is repeated the first ten rows, representing the width of the locating range. From the eleventh row, the same procedure is repeated for labels 7-12, leading to a total of 3600 recordings. After the introduction of a room change by an obstacle, the changed RIR is recorded again three times at the same 1200 positions in the same order to generate the online test data of 3600 additional files in the same order.The order also plays a role when testing the algorithm, since the test data vectors are classified in this order.

1) Signal Generation
The signal is generated by means of an audio file over an ordinary loudspeaker, which is driven by the peripheral computer. The audio file is created by a Matlab script. The lower cutoff frequency or Schroeder frequency f Schroeder is for a room with a volume of V = l · b · h = 3 m · 6 m · 2.5 m = 45 m 3 and a reverberation time T N ≈ 0.2 s approximately: In a heuristic procedure, however, the frequency of 100 Hz was determined. After the user presses a button on the IoT-Kit, the data acquisition process begins. The IoT-Kit sends a Message Queuing Telemetry Transport Protocol (MQTT) message using topic play to the peripheral computer, which triggers playback of the audio file. Figure 4 illustrates the machine to machine (M2M) communication for data acquisition.

2) Signal Recording
The generated audio file containing a 100 Hz sine sound of 1 ms duration is played by a distributed cyber-physical system (CPS) [14]. Since this requires a certain amount of transmission and processing time, the IoT-Kit waits a heuristically determined waiting time span until the recording starts. This waiting time ensures synchronization between recording and playback of the signal. The recording time t r = 1 second with a sampling rate of f s = 10 kHz, resulting in the length of the raw data l raw = 10000 measured values. We store all sampled measurements s(mT ) in a vector which represents the raw data of a recording [4].

3) Data Transmission
After recording, the data is sent to the cloud for storage and further processing via MQTT topic transmit. Due to limited storage ressources on the IoT-Kit, we store the measured values in a Byte array, where each value is represented by 2 Bytes. Regarding the maximum MQTT payload size of 1 kByte each message, we split the 20000 Bytes into 20 MQTT packages to be transmitted. Function byte_to_int calculates integer values from the submitted bytes. This allows us to leverage the maximum capacity of our IoT-Kit by outsourcing complex calculations and data structures.

4) Data Preprocessing
After the data transfer is complete, the IoT-Kit sends a command in topic convert to trigger the create .csv node that runs a Python script to merge the 20 generated plain text files into a valid .csv file to convert the received data into a suitable format for further processing. To limit the amount of data, a Matlab script cuts 0.2 seconds from the recording of the pulse to create the work data from the raw data. In order to cut the most informative 0.2 s, we search for the highest peak, and cut shortly before that to extract the following l work = 2000 values of the recording. After standardisation of these values, they serve as the work data x and basis for the formation of the long-term memory (LTM). We formalize a work data vector x n by: where n = 1, . . . , 3600 is the sequential number of the recording and entries of x n are in the range of [−1; 1]. We VOLUME 4, 2016 formalize the training data set M as a matrix of column-wise arranged work data vectors x n by and the Offline Setup Test Data as matrix OffData1 starting with x 2;1 , and OffData2 starting with x 3;1 using the second, respectively third instance of the work data from the three recordings at each position. This way, we build a similar, not equal test set simulating the offline scenario, when no room change is applied. The Online Setup Test Data, simulating the online case of a changed room setup by an obstacle in the locating range (see the book in Figure 3), OnData1, OnData2 and OnData3 are structured in the same way.
Since no training data is required as the algorithm learns while testing, we can use all three instances for testing purposes. The five test data sets will be handed over to the OOLA via O L , which incrementally transfers each single test data vector one after another to the classification algorithm.

C. ALGORITHMS
In this subsection, we describe the basic algorithms used and simulate the behaviour of LVQ.

1) K-Means
Since we work with limited resources, we first calculate twelve representative prototypes from each 100 training records per area using K-Means [15], resulting in the amount of l pro = 144 prototypes for the L = 12 areas. K-Means splits the passed set of 100 data vectors in twelve groups also known as Cluster and returns their centers which we use for further classification. The calculation of clusters leads to a small loss of information, a so-called in-sample error, which we accept in order to increase computational efficiency [5]. The in-sample error describes error of the model based on the training data. We reach classification rates of about 98 % (2 % in-sample error) in our example, which is a fair trade-off for minimizing the model from 1200 instances to 144. Cross-validation, by using the second instance of the three recordings at each position as training data set, leads to similar results. In this paper, our results refer to the use of the first instance of the three recordings at each position as training data.

2) KNN
The 144 K-Means generated representative prototypes are labeled according to their twelve origin areas. They have the same structure as the work data x n , renamed to p n : where n = 1, . . . , 144 and entries of p n are in the range of [−1; 1] due to standardization. However, they are assigned to an area by means of a label l n ∈ {1, . . . , L} = L and thus form the model of a KNN-algorithm [16], the LTM C L K-Means , given by: is formally represented as follows: where the superscript C inidicates the Constant, unchanged LTM. The LTM is used to decide, which of the prototypes, respectively label or area, is most similar to the test data to classify and locate the microphone in a special area. Therefore, the SL algorithm compares a new signal with the set of all labeled prototypes and determines the K next records based on a selected criterion, e.g. the Euclidean Distance. KNN checks the classes (labels) of those K determined Algorithm: KNN Find the K nearest neighbors to x 0 from C L Identify the class with the majority of the K points Output: Class for sample x 0 nearest neighbors and assigns the most frequently occurring class to the signal to be classified to estimate its position. Since the possible classes are directly related to a defined position, the position of the data record is determinedWe already performed the combination of K-Means and KNN in [4], resulting in about 88 % correct classifications applied on a 60 cm × 60 cm field on a table divided in 16 areas of 15 cm × 15 cm. The measuring points were intuitively distributed in a star shape in the squares, which we verify to the nearest centimeter in this paper. This method, also known as lazy learner, has advantages with regard to the complexity of the algorithm used and thus the runtime. KNN algorithms are also particularly suitable for SBI by evaluating the RIR, FIGURE 5: Improving K-Means with LVQ, the right plot right shows the moved representative prototypes and classification improvements.
• because they're able to directly classify by multiple groups, • from the evaluation of neighbors, resistance to distant outliers arises, • since their input is only the training data, a KNN can be adapted to new training objects by OL.

3) LVQ
The left part of Figure 5 visualizes the classification of a simulation with 400 two-dimensional, randomly generated dummy data vectors assigned to four groups. The small points represent the data sets, the large points represent calculated prototypes, crosses in the respective colors represent the classification of the data sets based on the model of the calculated prototypes. The calculation of the 12 representative prototypes of each group causes a lack of information which leads to misclassifications of the training data (in-sample failure). If only K-Means is used to calculate the prototypes, the classification of the training data results in 95.25 % correct predictions. Especially in the border areas (solid black lines) errors often occur, see orange circles. The right picture shows an incremental manipulation of the prototypes using LVQ leading to less faulty classifications. The prototypes were shifted in 25 incremental learning processes so that the classification is 100 % correct. The effect is particularly noticeable in the right center of the image, at the border between the purple and green areas. The LVQ [17] is another supervised classification method using a set of representative prototypes. First, the prototypes are initialized randomly. Then the nearest prototype is determined per data vector. If the class of the determined prototype corresponds to that of the data vector, the prototype is shifted in the direction of the current data vector. If the classes are different, the prototype is shifted in the opposite direction. The strength of the shift is controlled by a factor α(t), the so-called learning rate, which decreases depending on the time until all data vectors are processed or an abort criterion is fulfilled [18]. Figure 6 illustrates the process using the example of three randomly selected prototypes in the upper left corner and three sets of training vectors. The adjustment is done FIGURE 6: LVQ processed in online (left) and batch (right) mode [19].
• in the left picture by an incremental learning procedure with a learning rate of α = 0.05, • in the right picture by a batch learning procedure with a learning rate of α = 0.1 [19]. In batch learning, the prototypes are updated only after all data vectors have been processed, in incremental learning, the prototypes are updated after each training pattern [19]. LVQ have already been extensively modified for the implementation of incremental learning methods [20]. One approach is to update the prototypes only if the influencing training pattern lies at one of the decision boundaries between the classes concerned. A basic LVQ in [21] defines a factor s by and controls in which cases an update takes place. LVQ 2.1 does not meet the convergence criterion, but provides a suitable entry point to make more complex extensions easier to understand. The distances d 1 and d 2 represent the length of the connection vectors to the nearest prototypes (Euclidean distance). The distance to the next prototype p j of the same class as x i is described by d 1 . The distance d 2 represents the distance to the next prototype p j of a class different from x i .
If d 1 and d 2 are almost equal, the minimum quotient is close to oneIf x i is closer to one of the two prototypes p 1 or p 2 , d 1 and d 2 are further apart and therefore min( The appropriate choice of s can be used to control VOLUME 4, 2016 which training patterns may trigger a learning process. Figure  7 illustrates the adjustment of the prototypes by LVQ 2.1. The prototypes p 1 and p 2 are represented by the colored crosses, red and blue dots mark the last 40 training patterns of the respective classes red and blue. The black dots and lines represent the positions and shifts of the prototypes during the last 40 learning phases. The learning rate α is given by where t stands for the actual number of updates. The time factor t is only increased if the prototypes are actually shifted, further s = 0.5 was selected. Plot 40 of Figure 7 shows the position of the prototypes after the first 40 training patterns. The prototypes were initialized at coordinates (2,2) and (2,7), the training patterns were randomly generated with a maximum deviation of ±1 around the initial positions of the prototypes. Since the training patterns of the red and blue class are always relatively close to their prototypes, weak shifts (black lines) of the prototypes rarely occur because s < 0.5 most of the time.
If the training patterns change their labels, as in Plot 80 and Plot 120 of Figure 7, the effect of the shift is more pronounced. The prototypes learn their new position from the labeled training patterns and forget the old one. So they adapt to the new conditions. By choosing s and α the speed of the effect can be controlled. If the labels are reversed again, as shown in Plot 160 and 200 of Figure 7, the effect can be undone. According to this principle, the effects of changes in space on the RIR can be learned.
In various other variants, the LVQ can be easily adapted to the requirements of the application, e.g. the simultaneous updating of several prototypes per training sample is possible.
A further extension which optimizes the displacement of the prototypes is the Generalized Learning Vector Quantization (GLVQ) developed by Sato and Yamada [21]. They show that approaches of e.g. Kohonen [17] give good results, but do not necessarily converge in the global optimum. By introducing a minimized cost function, the lowest possible error rate is achieved.
Finally, Sato and Yamada introduce the Relative Distance µ(x) with where d 1 is the distance to the next prototype of the same class and d 2 the distance to the next prototype of another class. Since d 1 describes the distance to the prototype of the same class, µ(x) becomes negative if the classification is correct. To improve the error rate, µ(x) should decrease over the amount of data vectors entered similar to the factor s used in the LVQ 2.1 procedure. Sato and Yamada propose a function with a single maximum at µ = 0, whose width decreases with increasing time to ensure that prototype updates are only performed at similar distances to the training pattern. The intensity of the update depends on µ, and thus on d 1 and d 2 . By choosing µ appropriately, the convergence criterion can be met by this procedure.

D. SIMULATION RESULTS
While K-Means only calculates representative prototypes from the existing training data with the same label, it carries out a local optimization in the area. The shifting of the wrongly classified data sets of all labels by means of LVQ results in a global consideration of all prototypes. However, data sets in boundary areas represented by the K-Means prototypes may be closer to prototypes of neighboring areas, causing errors in classification. The LVQ adjustment shifts the prototypes of the boundary areas until all training data sets are correctly classified (in some cases only about 99 % could be reached) and thus obviously provides a sharper separation in the boundary areas (see Figure 5). In future tests of the OOLA, the success rates of classification by LVQ-manipulated prototypes C L LVQ , which correspond to the slightly shifted pure K-Means prototypes C L K-Means , will be compared. It is therefore examined if and how good these manipulations affect the classification rate on test data sets. To do this, a Matlab script adapts the previously selected set of prototypes via LVQ and compares the classification quotes. To investigate the stability of this effect, 60 sets of prototypes are manipulated using LVQ and the classification rates before and after are compared. The quotas are calculated by the ratio of the amount of correct predictions divided by total predictions. After evaluating the results, the following values can be determined: • maximum quota q max = 92.3 % • minimum quota q min = 89.2 % • maximum improvement i max = 4.6 % • minimal improvement i min = 0.8 % • mean of the improvement i mean = 2.5 % The LVQ thus offers an opportunity to improve the K-Means prototypes, which achieve positive results in all tested cases with the same room conditions. On average about 2.5 % more data points can be classified correctly using LVQmanipulated prototypes C L LVQ .

III. CASE STUDY -ONLINE OFFLINE LEARNING
In this section we first explain the components of the learning architecture and how they work together until we present our case study which helps understand the learning methodology.

A. ONLINE OFFLINE MEMORY
Even with improved prototypes, we still face the problem of environmental change. When applying our Online Setup Test Data (blue, red and yellow lines) from a changed environment to this model, the classification rates drop sharply, as shown in Figure 8. The colored lines show the ratio of so far correctly classified to so far processed test data vectors.
We still achieve about 90 % correct classifications on our fingerprinting environment data (Offline Setup Test Data, green and purple line), but the classification rates of the LTM for online data are too poor to be used in a real, changing environment. Fischer et al. [8] emphasize the requirements of modern applications in the field of autonomous robots or driving systems for the use of incremental learning methods. They refer to the complexity of human learning strategies, which, despite the use of incremental methods, preserve basic concepts or basic knowledge, but can nevertheless quickly adapt to important changes. They raise the question of how different mechanisms can be efficiently combined to ensure stability and flexibility at the same time, which are contradictory in their requirements. They address the so-called Catastrophic Forgetting Effect (CFE) [22], which occurs in incremental learning methods and describes the forgetting of already learned knowledge through incrementally acquired information. Another typical problem of incremental methods, in which the concept of classification (the relationship between input and output) changes, is also dealt with. The so-called concept-drift is avoided by the extension of a hybrid architecture, which consists of three components [8]: 1) A kind of static LTM to avoid forgetting and thus ensure stability. This classifier is generated by a batch learning process at the beginning of runtime and is not subject to any changes. 2) A flexible short-term memory (STM) as a second classifier, that starts without knowledge and adapts permanently and flexibly to the circumstances through incremental learning in order to respond to conceptual changes. 3) A decision element which compares the results of the classifications, in order to control the output of the OOLA and the training of the STM based on this data. The essential function of the LTM is to counteract the CFE. By separating the batch process from the incremental process, it can be ensured that changes to the prototypes through OL do not affect the static knowledge. The static knowledge C L is generated before runtime in the described fingerprinting process (Section II-C1 to II-D).
The rating of the prototypes in the STM and the selection of the decision element, which of the two classifiers is responsible in the respective case, is made by evaluating the Relative Similarity introduced in [23]. They negate µ(x) from GLVQ to the relative similarity relSim(x): This formally expresses the quality of the classification. The values of the relative similarity are rational numbers in the range of [−1, 1] relSim(x), • Values of one indicate a high level of security for the classification, since the distance (d 1 ) to the prototype of the class corresponding to the label of x is small. • values close to zero indicate a high degree of uncertainty, since the distance to the next correct prototype is approximately as large as the distance to the next incorrect prototype. • Values at minus one indicate a wrong classification. The latter two are of special importance for online learning. Since it makes no sense to learn already correctly classified vectors, the misclassified ones are especially interesting. They provide the highest gain in information. To test the OOLA we hand over the five test matrices 1) OnData1, 2) OnData2, 3) OnData3, VOLUME 4, 2016 FIGURE 9: Overview of the overall process, after data generation and preprocessing for training and test data creation, the LTMs are built up first. These components are then used to test the OOLA.

4) OffData1, 5) and OffData2
together with their origin labels described in Section II-B as online learning test matrix O L , defined as: one after another. A test data vector x O n equals the recordings x n = [x n;1 , . . . , x n;lwork ], the label l O n ∈ {1, . . . , L}, with n = 1, . . . , l pos . These are classified by the LTM, which leads to multiple misclassifications, but improves plasticity by using the misclassified data vectors with their correct label (provided by SL) as additional prototypes in the STM [24]. After building the STM, only prototypes that cannot be correctly classified by both classifiers will be learned. Due to limited resources, shortest possible computing time and the adaptability of the algorithm, prototypes should also be removed from the STM when they are no longer useful.

B. LEARNING AND FORGETTING
To evaluate their usefulness, the value of the relative similarity is summed up per prototype in the STM, if it falls below a certain value (here 0), the prototype is often responsible for wrong classifications and should be removed from the STM [25]. In particular, the STM K L starts without any knowledge, i.e. K L = ∅, two mechanisms are used for the generation of the incremental learned online knowledge:

1) add prototypes:
Based on the idea of [26], a Workbuffer W contains incorrectly classified online test data vectors x O i as candidates for a promotion to a STM prototype, up to a maximum memory capacity e. They are stored together with their correct label l O i and the relative similarity value relSim C (x O i ) calculated under usage of the prototypes of the LTM C L during classification. If the buffer is full, the candidate with the smallest value of relative similarity for each class occuring in W is stored in the STM K L as new prototype. This results in selecting data sets that have a high uncertainty and therefore are potentially useful. 2) remove prototypes: Similar to [25] the usefulness of prototypes is evaluated by their relative similarity. For this purpose each STM prototype is assigned aparameter relSim K for the accumulation of the relative similarities, which describes its usefulness for cost minimization. If a prototype represents the best matching unit (the nearest neighbor) for a new data vector, the evaluated relative similarity is added to this parameter. After a heuristically determined number of m learned data vectors, all prototypes with negative accumulated relative similarity are deleted.
The effects on the error rate of the classification are examined, regarding the number of processed data vectors m until deletion, and the variable size e of the buffer W. Workbuffer W is defined by: where a candidate equals a column of W including: • the vector x W n , which equals the i-th processed online of the online test data matrix O L , which lead to the n'th misclassification and therefore a negative relSim C (x O i ), • the label l W n ∈ {1, . . . , L} = L is the label l O i of this processed test data vector, • and relSim W n is the (negative) value of the relSim C (x O i ) calculated by the usage of the LTM C L , which leads to relSim W n ∈ [−1; 0[ due to selection strategy of the decision element, just adding prototypes with negative values of relSim(x O i ) as candidates, with n ∈ {1, . . . , e}. The negative value serves as indicator of the quality of test data vector O L (i) to be added as candidate with its origin label l O i (given by the SL) and the negative value relSim(x O i ) to W. If e is reached, for each existing label l W n in W, the one of the candidates carrying this label with the lowest relSim W will be promoted to a representative prototype p K lact+1 and therefore be added to the STM K L , given by: where l act gives the amount of prototypes actually in K L . The added prototype p K lact+1 consists of the following elements: • the prototype p K lact+1 equals the data vector x W n of the candidate W(n) which became promoted to the prototype, • the label l K lact+1 ∈ {1, . . . , L} = L is the label l W n of this promoted candidate, • and relSim K lact+1 is the accumulated value added up over all classification usages of this prototype, with the initial value of the accumulated Relative Similarity relSim K n = 0. Fischer et al. conclude in [8], that the appropriate choice of the parameters depends on the use-case and the complexity and accuracy of the resulting models. The STM in the context of this work has the task of learning changes in space, leading to a modified RIR. The test data vectors O L (i) are classified by both classifiers C L and K L in the hybrid OOLA. LTM C L and STM K L determine the values relative similarities relSim C temp (x O i ) and relSim K temp (x O i ) and label l C i and l K i according to their prototypes. These values are transferred to a decision algorithm, deciding • which classification l Out will be output after evaluating the relative similarities, choosing the label of the classifier with the highest relSim temp by, (17) • and whether the data vector is used to train the STM. We but both classifiers still fail, or if the LTM classification is worse than the classification of the STM, which generates many candidates with little chance to get promoted, or if both classifiers leave no better result than 0.2 (20) to cover the areas of high uncertainity. Due to limited resources and computational time issues, we limit the amount of prototypes for each label to 13, resulting in l proOn = 156 online prototypes p K n , where n = 1, . . . , l proOn . If a prototype of a label is to be learned that already has the maximum number of prototypes, the prototype with the actually lowest accumulated relSim K is replaced by the new prototype.

C. CLOSER LOOK AT THE RELATIVE SIMILARITY
The Relative Similaity plays the main role for the generation of the STM K L . It is responsible for evaluating the value of a test data vector to be stored in the Workbuffer W and is the criteria for those candidates to be promoted to an OL vector p K . As already mentioned, it makes no sense for the STM to learn information already present in the LTM. Therefore, [8] recommends learning data vectors which provide a negative relSim C . Figure 10 shows five different setups for potential OL vectors p K . In Plot 1 of Figure 10, the data vectors near VOLUME 4, 2016 FIGURE 10: Effects of the existing prototypes for the Relative Similarity and their effects on the selection of online learning prototypes p K regarding just two areas (abscissa: feature 1, ordinate: feature 2). The light red (p C (1)) and light blue (p C (2)) crosses show the LTM C L prototypes of the both areas 1 and 2. Dark red (p K (1)) and dark blue (p K (2)) crosses show 5 hypotetic online learned prototypes. The light red points are examples of test data vectors from area 1 (SL) to be classified. Table 2 - Table 6 show the values of the different Relative Similarities and decisions of the OOLA.
the LTM C L have been promoted and learned, even having a high relSim C = 0.89 as summarized in Table 2.   TABLE 2: Relative Similarities, classification decisions and their reasons -Plot 1 of Figure 10 Data The classifications of the test data vectors x O 1 − x O 3 from area 1 (with label 1) can be interpreted as follows: • The test data vector x O 1 is classified correctly (green colored font in field l C i ) by the STM K L , but the LTM would have lead to the same result anyhow, so the prototype was kind of useless. • The test data vectors x O 2 and x O 3 are classified wrong (red colored font in field l C i ), the STM can not provide a better relSim K .
It can be said that learning test data vectors with high relative similarities does not have a particularly beneficial effect.
Comparing the results of Table 3 confirms the observation, however, as the distance between the prototypes learnt (p K (1) and p K (2)) and the LTM prototypes (p C (1) and p C (2)) increases, the relative similarities improve slightly.  Figure 10 Data Table 4 referring Plot 3 of Figure 10 shows no new insights, but confirms the improvement of the relative similarities especially for x O 3 .  Figure 10 Data Finally, Table 5 related to Plot 4 of Figure 10 shows the intended effect, the learnt test data vectors which have been classified wrong by the LTM lead to correct classifications for x O 2 and x O 3 .  Figure 10 Data Additionally, Table 6 related to Plot 5 of Figure 10 confirms the results. The learnt test data vectors which have been classified wrong by the LTM lead to correct classifications for  Figure 10 Data It can also be stated that the relSim C values for remain constant, while the relSim K values improve when using STM prototypes with negative relSim C . However, this is only happening if those test data vectors have not already been correctly classified by the LTM (see x O 1 ). Selecting the highest Relative Similarity according to equation (17) leads to improvement of the overall number of correct classifications. Figure 9 shows the overall process with the components involved. As part of Data Generation, 7200 individual .csv files are generated, which represent RIR recordings. During Data Preprocessing, these recordings are processed to the training data vectors in M and the offline test data vectors in OffData1 and OffData2, respectively the online test data vectors in OnData1, OnData2 and OnData3 with their associated labels. The first part of the learning section, the Section Data Analysis -Offline Learning describes the generation of both LTM the K-Means prototypes C L K-Means , which are trained offline in batch mode by K-Means, and the incrementally improved LVQ-manipulated prototypes C L LVQ to determine the suitability of LVQ for improving the classification quality of the LTM. The second part of the learning section, the Section Data Analysis -Online Learning describes the OOLA, which is adapting to changes by evaluation of the STM K L without losing existing knowledge in the LTM.

D. CASE STUDY
In order to compare the quality of the STM and to investigate the behaviour of the algorithms, the Matlab script InitOOL.m loads the prepared data sets and calls the main program OOL.m five times with different work data. This is repeated in five different test cases C1-C5: C1) Limitation of the number of prototypes to 13: Each area (label) has a maximum of 13 prototypes p K in the STM K L . C2) Verification with 19 prototypes per label: Increasing the maximum amount to 19 prototypes p K each label in the STM K L to determine the trade-off between improvement increased calculating time. C3) Random order of test data: Using 13 prototypes p K each area and bringing in the test data vectors randomly instead of using the data generation order described in Section II-B. C4) Further criterion for the deletion of prototypes: Using 13 prototypes p K each area determining, if the deletion of the prototype, which was not involved in a positive classification for the longest time leads to better results than using the one with the lowest accumulated relative similarity relSim K . C5) Only learning with wrong LTM classification: Using 13 prototypes p K each area disallowing to learn prototypes when the LTM provides a correct classification (only using the condition of Equation 18).
While processing these test data sets, the algorithm will fill the buffer W as described above, until it reaches its maximum size e. Each time an online data test set O L (i) is classified wrong by the LTM, it will be added to W as potential candidate. When the size of W reaches e, algorithm LO.m will add one candidate of each label existing in W as prototype to the STM K L , together with the correct label, the origin of the online test data vector (provided by SL), and an initial value (we use 0) of the accumulated relative similarity relSim W e . The candidate with the lowest relSim W e is selected and thus represents the highest gain in information. After learning m test data vectors, the STM prototypes with a negative relSim W e will be deleted. These sequences will be repeated until the 6000 test data vectors are processed in five groups, respectively the five online test data sets: 1) The LTM C L and an empty STM K L as well as the first online test data set OnData1 as O L after bringing in the room change. The OOLA starts learning the room changes and returns a STM with selected prototypes. Compared to the blue line in Figure 8, the blue lines of Figure 11 perform better, depending on the values of e and m. OL.m further hands over the STM K L and the filled Workbuffer W to algorithm LO.m, if the conditions are met, which adapts the STM. Algorithm Learn Online (LO) is the main component for learning the modified RIR. Each time the amount of e records in the work buffer W is reached, they are passed to LO along with the current STM. LO determines the best candidate for each label present in W and adds it to the STM. After processing each existing label, all candidates are erased, W = ∅. The e parameter can therefore affect the quality of the learned prototypes and the speed of growth of the STM. The higher e, the higher the quality of the learned prototypes should be, but the STM takes longer to fill. The lower e, the faster the Workbuffer gets learned, e.g. e = 1, each candidate is learned immediately. LO further checks the amount of incrementally processed test data vectors and starts the deletion of STM prototypes with negative accumulated relative similarity relSim K . It should be emphasized once again that Algorithm OOL.m receives the test data individually, RIR for RIR one after the other, and always learns immediately when W reaches the size e. Due to faster simulation results and for reasons of organization of work processes we have not focused on a variety of agents to be located. Since the data is transferred RIR-wise, however,

IV. RESULTS
The overall results show an increase of the average classification rate to about 95 % for unchanged environments and the ability of the algorithms to adapt to changed environments by increasing from 72 % to 91 % (configuration dependent) of the OOLA compared to about 45 % (see Figure 8) in a pure offline setup with just LTM.
These results could even be improved, if not limiting the amount of prototypes each area, but this leads to very long processing times. The diagrams of Figure 11 show eight success rates of classification under different configurations of e and m with standard and improved LTM. With e increasing from 1 to 50, the STM will not learn until e potential candidates for learning new prototypes are reached, which explains the sharp drop in the blue curve in the lower four plots, using e = 50. However, the effect should have a positive long-term effect with larger test data sets, as the quality of the prototypes should increase, due to selection of the best prototype each label is obsolete while e = 1. Because of the increase of m from 1 to 50, prototypes of the STM that have just been learned are no more deleted immediately, once they are responsible for a single wrong classification, and thus have a negative accumulated relative similarity. This can have a positive effect on the success rates, because they may start FIGURE 11: Classification quota of offline and online setup test data by OOLA, showing good results for changed environments and improved results for standard room setup, regarding various combinations of e and m. The x-axis represents the l pos = 1200 work data vectors defined in Section II-B, the y-axis shows the ratio of so far correct classified to so far processed test data vectors. fitting in the long run. By processing the test data line by line, only test data in the 1 − 6 range will be processed up to the 600-th data set. Starting from 601 the test data of the ranges 7−12 are classified, the range change leads to a strong decrease of the quota, since no current prototypes of these ranges are available in the STM yet.
The effect of the improved LTM is not visible in the plots, but a closer look at the classification rates shows that LVQ manipulation leads to an improvement in 1783 of 2500 cases. Figure 12 shows the errors of all five test runs bundled together. The upper two images show the errors when using the non-manipulated LTM, the lower ones the errors when using the LVQ-manipulated LTM.  The colour codes shown in Table 8 distinguish between online and offline data sets to be tested.   high values for e and low values for m, the LVQ manipulation sometimes leads to strong deteriorations, e,g. on the online test data set OnData3 for the last two test cases. In 1783 cases an improvement by LVQ-manipulated LTM is achieved, in 717 cases it leads to a deterioration of the results. Table 9 serves as color code table for Figure 13.  Figure 13 Color Error difference green > 60 light green > 0 orange ≤ 0 red < 60 The Tables 10 and 11 show the sum of the classification errors depending on e and m when using the K-Means LTM and LVQ-LTM respectively. Cells with a green background indicate the lowest number of errors per column, cells with the highest number of errors per column are marked in red. The lowest number of errors per row is indicated by a green font and the highest by a red font. This form of representation clearly shows that the success rate of the classification is higher at low values for e (green/red cells). There is also a tendency that the number of errors decreases with an increase of m (green/red font). Each field shows the amount of misclassifications per 30000 test data vectors, 1200 vectors × 5 sets × 5 test cases = 30000 classifications.

V. CONCLUSION
The results of the case study show the robustness of the presented concept for indoor localization via the room impulse The OOLA provides good results for the application of a table demo, but raises the question whether the results can also be achieved in larger rooms. The Tables 10 & 11 clearly show, that low amounts of e (size of the Workbuffer W) lead to better classifications results in our setup. However, it is likely that the e increase in significantly larger test data sets will lead to a long-term improvement in the classification rate, as more candidates that are potentially better suited will be available for selection. The configuration is therefore potentially suitable to configure the algorithm for different use cases. Small values for e could be interesting in cases that are subject to rapid frequent changes. High values for e should be interesting in stable environments, that are subject to minor changes. Table 10 & 11 also allow the assumption that low values of m (number of learning operations until deletion of STM prototypes with negative Relative Similarity relSim K ) can lead to prototypes being deleted "too quickly" and thus not developing their potential, since low values for m often yield poorer classification rates. These effects should also be investigated more intensively in further work.

B. OUTLOOK
Further projects in a 3D environment have already been launched to investigate more complex cases than the ones presented in this work. We are planning an extended case study to generate larger test data sets, a live environment with multiple moving agents to be located and to replace the fingerprinting process with the extraction of an early STM as LTM. The generation of the 7200 raw data files took three working days, so we think about the development of an agent based automated supervised online learning scenario. We continue to think about dynamically modifying the e and m parameters to adjust the learning speed depending on the error rate for different use cases and loading multiple STMs for different situations. In addition, there are some other questions to be explored -especially regarding energy consumption -like the reduction of transmitted characteristics on the IoT-Kit to lower the amount of data, or to use an alternative technology like bluetooth or zigbee instead of WiFi or implementing a threshold monitoring at the analog digital converter to stay in sleep mode till a sound wave reaches the microphone. Also, different room configurations, sizes and distances have to be investigated, not least to determine the required sound pressure corresponding to the range of the process. It might be useful to switch to the infra-and ultrasound spectrum, but this could create new challenges. Finally, the extremely low hardware costs yield great potential for improving the results by investing in better microphones or more sophisticated algorithms.