Suggestions and Comparisons of Two Algorithms for the Simplification of Bluetooth Sensor Data in Traffic Cordons

Bluetooth sensors in intelligent transportation systems possess extensive coverage and access to a large number of identity (ID) data, but they cannot distinguish between vehicles and persons. This study aims to classify and differentiate raw data collected from Bluetooth sensors positioned between various origin–destination (i–j) points into vehicles and persons and to determine their distribution ratios. To reduce data noise, two different filtering algorithms are proposed. The first algorithm employs time series simplification based on Simple Moving Average (SMA) and threshold models, which are tools of statistical analysis. The second algorithm is rule-based, using speed data of Bluetooth devices derived from sensor data to provide a simplification algorithm. The study area was the Historic Peninsula Traffic Cord Region of Istanbul, utilizing data from 39 sensors in the region. As a result of time-based filtering, the ratio of person ID addresses for Bluetooth devices participating in circulation in the region was found to be 65.57% (397,799 person IDs), while the ratio of vehicle ID addresses was 34.43% (208,941 vehicle IDs). In contrast, the rule-based algorithm based on speed data found that the ratio of vehicle ID addresses was 35.82% (389,392 vehicle IDs), while the ratio of person ID addresses was 64.17% (217,348 person IDs). The Jaccard similarity coefficient was utilized to identify similarities in the data obtained from the applied filtering approaches, yielding a coefficient (J) of 0.628. The identity addresses of the vehicles common throughout the two date sets which are obtained represent the sampling size for traffic measurements.


Introduction
With the widespread adoption of Intelligent Transportation System (ITS) applications, Bluetooth sensors in particular have become commonly used advanced traffic measurement devices that can measure travel time, average speed, average progression and travel delays [1].Bluetooth sensors also stand out due to factors such as accuracy, costeffectiveness and ease of installation and use.With a coverage area of approximately 500 m, Bluetooth sensors do not differentiate between device types when identifying the MAC addresses of devices.A device with Bluetooth capability could belong to a vehicle's Bluetooth device or various objects such as smartphones, smartwatches, computers, wireless headphones, micromobility vehicles or smart devices carried by passengers traveling in vehicles or public transportation.Bluetooth sensors detect the ID addresses of all objects with Bluetooth capability without filtering among them.Due to the widespread use of Bluetooth technologies, the variety of data coming from sensors has become a problem.When performing traffic calculations, it is not possible to produce realistic calculations in the current system by using the entire obtained data [2].In a study conducted to evaluate the scope of Bluetooth technology, an experimental test environment was prepared, and it was indicated that the presence of multiple Bluetooth-containing objects could lead to a performance drop in the MAC protocol [3].In later years, studies in the United States began to record and track data from Bluetooth sensors.By comparing the travel times generated by license plate reading devices on corridor lines with Bluetooth data, it was observed that average travel times yielded results that were 4-7% more accurate [4].In a study conducted by Araghi et al. on the reliability of Bluetooth technology, it was found that a Bluetooth-enabled device would be detected with 80% reliability when passing through a sensor location [5].Bluetooth, with its ability to provide real-time information as opposed to relying on past data, has been identified as a potential candidate for O-D estimation [6].However, the literature indicates that continuous research is needed to maximize the potential of this technology in traffic management [7].In Turkey, Bluetooth sensors are used for various purposes such as collecting travel data for different transportation modes, vehicle tracking and monitoring systems, fleet management for logistics companies, cargo tracking, monitoring urban parking areas and providing information about parking spaces.In a study conducted in Istanbul, a method was developed to produce average speed values for Istanbul's urban traffic by using traffic measurement data obtained from sensors on highways, mobile application users and vehicle tracking systems [8].Another study using air quality indicators and traffic index values for Istanbul included the changes in traffic index values and air quality indicators during the pandemic period [9].In a study on intelligent mobility in Istanbul, both data obtained from traffic measurement sensors and vehicle tracking and mobile user data were processed through the Tukey Fences Clipping algorithm to bring outlier values in the data to normal values [10].
New approaches using machine learning techniques and tree-based algorithms through Bluetooth sensors are available to increase the reliability of Bluetooth-based traffic flow measurements and make it a more desirable and cost-effective solution for real-time traffic flow measurement [11].In a study conducted to monitor real-time public transport passenger flow and O-D information based on Wi-Fi and Bluetooth sensors, a three-step, data-driven algorithm framework was proposed.The observed passenger flow was used as the ground truth to evaluate the performance of the proposed algorithm and according to the evaluation results, the proposed algorithm outperformed all selected baseline models and existing filtering methods [12].In his study, Koçak indicated that determining travel time distribution using detectors with data fusion methods was successful, but Bluetooth sensor data were more reliable for OD matrices [13].Bugdol conducted traffic prediction using standard Bluetooth data and obtained quite promising results [14].In a study conducted on pedestrian traffic estimation with Bluetooth sensor technology, a detection methodology that included system calibration considering the likelihood of vehicle traffic congestion, travel time calculation and speed-based classification was proposed, and this methodology yielded an 89% accuracy rate in pedestrian detection [15].
The distribution of vehicles circulating in a region causes the planning of traffic routes.It is also used in determining alternative routes and planning public transportation.It allows traffic flow to be managed more efficiently, especially during peak hours.For these reasons, finding the distribution of data from sensors between vehicles and pedestrians allows traffic planning.
The purpose of this study is to simplify the data coming from Bluetooth sensors in cordon areas, such as city centers or busy commercial areas where transportation is heavy, to include only vehicles.For this purpose, two different simplification algorithms have been prepared.These algorithms include two basic filtering methods: "time-based filtering" and "speed-based filtering".The results obtained from the Python code were compared with each other.The rule-based modeling used for speed-based filtering offers a simple and interpretable approach, but it may have some limitations in terms of flexibility and generalization power.Therefore, rule-based modeling is often compared with statistical methods and machine learning algorithms in the data analysis and modeling process.Statistical methods and machine learning aim to capture relationships and patterns in data in a more flexible and generalizable way.These methods allow for a broader understanding of the data and can be used to solve more complex problems.For this reason, a time-based filtering approach is also needed.
This article is structured as follows: Section 1 includes the definition of the problem and the review of similar studies in the literature.Section 2 provides an overview of the study's methodology.Section 3 examines the Bluetooth sensor filtering algorithms, including their flowcharts, optimization procedures, and the model for comparing the two algorithms in detail.Section 4 presents the case study.It utilizes data from Bluetooth sensors positioned in the Historic Peninsula region of Istanbul, providing detailed information and addressing exceptional cases.Section 5 contains the results and discussion section, which includes a comparison of the computational efficiency of the two algorithms.

Methodology
As elements of intelligent transportation systems, Bluetooth sensors do not differentiate between device types when identifying the MAC addresses of devices and detect the ID addresses of all objects with Bluetooth capability without filtering among them.Due to the widespread use of Bluetooth technologies, the diversity of data coming from sensors has become a problem.It is necessary to simplify the data coming from Bluetooth sensors to include only vehicles.This study proposes two different filtering algorithms for simplification.These algorithms include two basic filtering methods: "time-based filtering" and "rule-based speed filtering".The results obtained from the Python code were compared with each other.The comparison was made using the Jaccard similarity method.After the filtering was completed, cases where the identity addresses of devices matched, diverged, or were present in only one dataset were identified.The methodology of the study is presented below in the flowchart (Figure 1).Optimization procedures are detailed in Section 3.
Sensors 2024, 24, x FOR PEER REVIEW 3 of 25 Statistical methods and machine learning aim to capture relationships and patterns in data in a more flexible and generalizable way.These methods allow for a broader understanding of the data and can be used to solve more complex problems.For this reason, a timebased filtering approach is also needed.This article is structured as follows: Section 1 includes the definition of the problem and the review of similar studies in the literature.Section 2 provides an overview of the study's methodology.Section 3 examines the Bluetooth sensor filtering algorithms, including their flowcharts, optimization procedures, and the model for comparing the two algorithms in detail.Section 4 presents the case study.It utilizes data from Bluetooth sensors positioned in the Historic Peninsula region of Istanbul, providing detailed information and addressing exceptional cases.Section 5 contains the results and discussion section, which includes a comparison of the computational efficiency of the two algorithms.

Methodology
As elements of intelligent transportation systems, Bluetooth sensors do not differentiate between device types when identifying the MAC addresses of devices and detect the ID addresses of all objects with Bluetooth capability without filtering among them.Due to the widespread use of Bluetooth technologies, the diversity of data coming from sensors has become a problem.It is necessary to simplify the data coming from Bluetooth sensors to include only vehicles.This study proposes two different filtering algorithms for simplification.These algorithms include two basic filtering methods: "time-based filtering" and "rule-based speed filtering".The results obtained from the Python code were compared with each other.The comparison was made using the Jaccard similarity method.After the filtering was completed, cases where the identity addresses of devices matched, diverged, or were present in only one dataset were identified.The methodology of the study is presented below in the flowchart (Figure 1).Optimization procedures are detailed in Section 3.

Time-Based Filtering
Upon examining the accessed data, it is observed that a data series is formed over time and the data rows are arranged in a periodic cycle (s, h, days, months, years) [16].The primary goal of analyzing and modeling time series data is to separate noise (the presence of random, meaningless, misleading data that complicate analysis and affect accuracy) from the dataset, thus obtaining an undisturbed series [17].When filtering based on time, Simple Moving Average (SMA) and Threshold models, commonly used in time series analysis, are utilized.These two methods, which are statistical analysis tools, are frequently employed in data analysis processes.While SMA holds a significant place in time series analysis, the threshold model is commonly used in simple and fast decisionmaking processes.An algorithm utilizing these models was coded in the Python programming language and the dataset was parsed.At the end of the parsing, identity numbers

Time-Based Filtering
Upon examining the accessed data, it is observed that a data series is formed over time and the data rows are arranged in a periodic cycle (s, h, days, months, years) [16].The primary goal of analyzing and modeling time series data is to separate noise (the presence of random, meaningless, misleading data that complicate analysis and affect accuracy) from the dataset, thus obtaining an undisturbed series [17].When filtering based on time, Simple Moving Average (SMA) and Threshold models, commonly used in time series analysis, are utilized.These two methods, which are statistical analysis tools, are frequently employed in data analysis processes.While SMA holds a significant place in time series analysis, the threshold model is commonly used in simple and fast decision-making processes.An algorithm utilizing these models was coded in the Python programming language and the dataset was parsed.At the end of the parsing, identity numbers (IDs) belonging to the designated categories of vehicle ID and person ID were obtained [2].If irregular movements dominate in a time series, a moving average model consisting of the following steps is used to determine the general trend of the series and reveal the cyclical effect [18]: • Accumulation of data regarding the time series; • Determination of the period; • Calculation of the average;

•
Updating calculations with each new data entry.
The moving average (SMA) value E(t) of the time during which the identity addresses of the devices are detected by the Bluetooth sensor is calculated using the equation below • t ID : A series including the device ID detected more than once by the Bluetooth sensor; • N: Number of periods in the average calculation.
When the Bluetooth sensors detect the ID identity address of the device, the time period during which the device is within the Bluetooth sensor coverage is recorded in a variable (t BS ).t bs represents the duration in which the device is within the Bluetooth sensor range.

Threshold Model
This model, utilized to determine whether an event or condition will occur, typically sets a threshold and decides on further action based on whether the event surpasses this threshold.The following steps are followed in the Threshold Model: 1.
Determining a threshold value according to the requirements of the research.

2.
Utilizing the threshold value to make decisions based on data that fall above or below the threshold (Equation (2)).

3.
Filtering based on these decisions.
Threshold Control: k: A coefficient that controls the relationship between the t BS value and the threshold value (E(t)).If no specific coefficient is defined, the operation is performed for k = 1.
After calculating the total duration each device remained within the sensor coverage area for each device using the Group by function in PYTHON 3.11.9,E(t) is determined by taking the average of these durations.Finally, decisions are made for data above or below this threshold value.As it is known, in Equation (2), t BS represents the time the Bluetooth device remains within the sensor range, while E(t) represents the weighted average of this time for all devices.If this condition in the equation is satisfied, it is assumed that the analyzed device belongs to the "person ID" address.Indeed, the threshold value here serves as an "acceptance function" in the person-vehicle separation in the study.
The algorithm prepared to perform these operations in the Python programming language is presented in Figure 2.
The methodology using time series first calculates the Simple Moving Average (SMA).For this, data pertaining to the time series are collected, and a period is determined.Using code written in the PYTHON programming language, the average time each Bluetoothequipped device spends at the sensor is determined.In the second part of the model, a threshold is used.In the threshold model, a threshold value is determined according to the research requirements.Here, the threshold value is the SMA (duration) value obtained for each device.Subsequently, using this threshold value, decisions are made, and filtering is performed based on whether the data are above or below the SMA.Accordingly, if a device spends more time at a sensor than its own average under non-traffic conditions, this device cannot belong to vehicles.This is because the time a moving device spends at a sensor under flowing traffic conditions is less compared to other conditions.The methodology using time series first calculates the Simple Moving Average (SMA).For this, data pertaining to the time series are collected, and a period is determined.Using code written in the PYTHON programming language, the average time each Bluetooth-equipped device spends at the sensor is determined.In the second part of the model, a threshold is used.In the threshold model, a threshold value is determined according to the research requirements.Here, the threshold value is the SMA (duration) value obtained for each device.Subsequently, using this threshold value, decisions are made, and filtering is performed based on whether the data are above or below the SMA.Accordingly, if a device spends more time at a sensor than its own average under nontraffic conditions, this device cannot belong to vehicles.This is because the time a moving device spends at a sensor under flowing traffic conditions is less compared to other conditions.The value, called the threshold value and expressing the average time spent by Bluetooth devices between consecutive sensors, was examined for periodic time periods.As explained above, devices with remaining time values above the threshold value are in the 'Personal Devices ID' category and devices below are in the 'Vehicle ID' category.
The above flow chart (Figure 2) was coded in Python programming language and the data were parsed; as a result, the matching Bluetooth devices were captured in the vehicle ID category (Figure 3).These data will be compared with the speed-based filtering algorithm in the next stage.
Sensors 2024, 24, 4375 6 of 22 data were parsed; as a result, the matching Bluetooth devices were captured i ID category (Figure 3).These data will be compared with the speed-based fi rithm in the next stage.

Speed-Based Filtering
This method is based on the rule-based modeling approach of speed modeling can be explained in several stages:  Obtaining distance data between sensors;  Travel time calculation;  Using two timestamps recorded by sensors to calculate trip travel time;  Classification according to device speeds.
For the filtering to be performed, speed data are required.To deter speeds, both distance and time data are necessary.Firstly, for each device (id a code has been written to determine which sensor the device passed throug time interval, the time it was first read by the Bluetooth sensor, the last tim and the duration it stayed at that sensor during the relevant time interval code has been written for this purpose and the flow chart of the code is presen 4.

Speed-Based Filtering
This method is based on the rule-based modeling approach of speed values.This modeling can be explained in several stages:

•
Obtaining distance data between sensors; • Travel time calculation;

•
Using two timestamps recorded by sensors to calculate trip travel time; • Classification according to device speeds.
For the filtering to be performed, speed data are required.To determine device speeds, both distance and time data are necessary.Firstly, for each device (idaut number), a code has been written to determine which sensor the device passed through in each 1 h time interval, the time it was first read by the Bluetooth sensor, the last time it was read and the duration it stayed at that sensor during the relevant time interval (in h:min).A code has been written for this purpose and the flow chart of the code is presented in Figure 4.
Then, speed measurements were made in 1 h periods for each device read from the Bluetooth sensors in the area.When calculating speed data, the following points must be considered:

•
In cases where the device is read by only one sensor, the distance data are defined as 500 m since the sensor coverage area is 500 m.

•
In cases where the device is read by more than one sensor, distance data are defined as the distance between sensors.

•
In cases where the device's residence time at the sensor was 00:00:00, no action was taken for this device.It was accepted that this device was not in the vehicle ID category.
The relevant code and its output are presented in Figure 5.
The flow chart prepared to find the identification numbers of Bluetooth devices read by Bluetooth sensors and the passing speed of these devices through the sensors is as follows.The results will be discussed in the next section.In cases where the device is read by more than one sensor, distance data are defined as the distance between sensors. In cases where the device's residence time at the sensor was 00:00:00, no action was taken for this device.It was accepted that this device was not in the vehicle ID category.
The relevant code and its output are presented in Figure 5.At the end of this classification, device identification numbers belonging to different categories were identified.We can see the classification algorithm and the time spent by vehicle identification addresses and the method of finding their speed in the flow chart below.This flow chart (Figure 6) is converted into code in the Python 3.11.9software language and vehicle and pedestrian distribution is achieved.

Jaccard Similarity Coefficient
The data accessed as a result of the applied filtering approaches appear to be close to each other in terms of numerical values.However, what is obtained in the result set are two different datasets containing ID numbers of devices with Bluetooth connection.It would not be a realistic approach to expect the IDs in both datasets to match each other exactly.Therefore, the similarities of the models need to be determined.
The Jaccard index, also known as the Jaccard similarity coefficient, is a widely used measure of similarity between sets of models in various fields such as data mining, clustering and genomics [23].Overall, the Jaccard index stands out as a fundamental similarity

Jaccard Similarity Coefficient
The data accessed as a result of the applied filtering approaches appear to be close to each other in terms of numerical values.However, what is obtained in the result set are two different datasets containing ID numbers of devices with Bluetooth connection.It would not be a realistic approach to expect the IDs in both datasets to match each other exactly.Therefore, the similarities of the models need to be determined.
The Jaccard index, also known as the Jaccard similarity coefficient, is a widely used measure of similarity between sets of models in various fields such as data mining, clustering and genomics [23].Overall, the Jaccard index stands out as a fundamental similarity measure with wide applications in various disciplines and offers a standardized approach to measuring similarity between datasets (Figure 7).The Jaccard index measures the similarity between two clusters by calculating the ratio of the number of elements common to both clusters to the total number of different elements in the clusters [24].The resulting value varies between 0 and 1; 0 represents no overlap between clusters, and 1 represents complete overlap [25].Two different datasets containing ID numbers of devices with Bluetooth connection accessed as a result of the applied filtering approaches were run in the code written to detect the Jaccard index.
In the graph in the figure (Figure 8), vehicle identities that intersect with each other in the datasets consisting of Bluetooth identities of the vehicles (Intersection), vehicle identities that are only in the cluster formed as a result of the time-based algorithm (only in df1), vehicle identities that are only in the cluster resulting from the speed-based algorithm (only in df2) and discrete distributions of vehicle identities (Symmetric Difference) of the elements are shown.

Selection of the Study Area
The Historic Peninsula of Istanbul, where many transportation modes are used together, serves as a transit route for Istanbul, and data collected from Bluetooth sensors play an important role in intelligent transportation system-based planning.Additionally, due to data availability, the Historic Peninsula of Istanbul is considered a suitable example area.Specific points that can serve as start and end points have been designated in this The Jaccard index measures the similarity between two clusters by calculating the ratio of the number of elements common to both clusters to the total number of different elements in the clusters [24].The resulting value varies between 0 and 1; 0 represents no overlap between clusters, and 1 represents complete overlap [25].Two different datasets containing ID numbers of devices with Bluetooth connection accessed as a result of the applied filtering approaches were run in the code written to detect the Jaccard index.
In the graph in the figure (Figure 8), vehicle identities that intersect with each other in the datasets consisting of Bluetooth identities of the vehicles (Intersection), vehicle identities that are only in the cluster formed as a result of the time-based algorithm (only in df1), vehicle identities that are only in the cluster resulting from the speed-based algorithm (only in df2) and discrete distributions of vehicle identities (Symmetric Difference) of the elements are shown.The Jaccard index measures the similarity between two clusters by calculating the ratio of the number of elements common to both clusters to the total number of different elements in the clusters [24].The resulting value varies between 0 and 1; 0 represents no overlap between clusters, and 1 represents complete overlap [25].Two different datasets containing ID numbers of devices with Bluetooth connection accessed as a result of the applied filtering approaches were run in the code written to detect the Jaccard index.
In the graph in the figure (Figure 8), vehicle identities that intersect with each other in the datasets consisting of Bluetooth identities of the vehicles (Intersection), vehicle identities that are only in the cluster formed as a result of the time-based algorithm (only in df1), vehicle identities that are only in the cluster resulting from the speed-based algorithm (only in df2) and discrete distributions of vehicle identities (Symmetric Difference) of the elements are shown.

Selection of the Study Area
The Historic Peninsula of Istanbul, where many transportation modes are used together, serves as a transit route for Istanbul, and data collected from Bluetooth sensors play an important role in intelligent transportation system-based planning.Additionally, due to data availability, the Historic Peninsula of Istanbul is considered a suitable example

Case Study 4.1. Selection of the Study Area
The Historic Peninsula of Istanbul, where many transportation modes are used together, serves as a transit route for Istanbul, and data collected from Bluetooth sensors play an important role in intelligent transportation system-based planning.Additionally, due to data availability, the Historic Peninsula of Istanbul is considered a suitable example area.Specific points that can serve as start and end points have been designated in this area, known as the cordon area, and data from 39 Bluetooth sensor devices located between the I-J point pairs have been used.

Access to Bluetooth Sensor Data
In the application part of the study, data from sensors positioned at various locations near the entry and exit points (gates) of the cordon area were accessed through relevant official organizations.The accessed data include:

•
Bluetooth sensor numbers (idsen); • Identity (ID) addresses of objects with Bluetooth devices (idaut); • Time information at the moment the Bluetooth sensor and device pairing is established (in the format Day-Month-Year; h-min-s) (time).

User Identification
Service users within the coverage area of the Bluetooth sensor in the study area were identified and classified based on information provided by official institutions and observation-based field studies as follows: • Identity (ID) addresses of vehicles (Private Vehicles, Heavy Vehicles, Light Commercial Vehicles, Ships, Public Transport Vehicles); • Identity (ID) addresses of individuals (Smartphones, smartwatches, wireless headphones belonging to pedestrians, Bluetooth devices carried by passengers traveling on public transport or in vehicles, bicycles, motorcycles, pedestrians traveling on micromobility vehicles, Bluetooth devices carried by multiple pedestrians traveling together).

Bluetooth Sensor Location Review
Bluetooth sensors in their positioned state within the area are shown in Figure 9.
Sensors 2024, 24, x FOR PEER REVIEW 12 of 25 area, known as the cordon area, and data from 39 Bluetooth sensor devices located between the I-J point pairs have been used.

Access to Bluetooth Sensor Data
In the application part of the study, data from sensors positioned at various locations near the entry and exit points (gates) of the cordon area were accessed through relevant official organizations.The accessed data include:  Bluetooth sensor numbers (idsen);  Identity (ID) addresses of objects with Bluetooth devices (idaut);  Time information at the moment the Bluetooth sensor and device pairing is established (in the format Day-Month-Year; h-min-s) (time).

User Identification
Service users within the coverage area of the Bluetooth sensor in the study area were identified and classified based on information provided by official institutions and observation-based field studies as follows:  Identity (ID) addresses of vehicles (Private Vehicles, Heavy Vehicles, Light Commercial Vehicles, Ships, Public Transport Vehicles);  Identity (ID) addresses of individuals (Smartphones, smartwatches, wireless headphones belonging to pedestrians, Bluetooth devices carried by passengers traveling on public transport or in vehicles, bicycles, motorcycles, pedestrians traveling on micromobility vehicles, Bluetooth devices carried by multiple pedestrians traveling together).

Bluetooth Sensor Location Review
Bluetooth sensors in their positioned state within the area are shown in Figure 9.The distribution of Bluetooth sensors in the İstanbul Peninsula Historical area, along with their sensor numbers, is shown in Table 1.The distribution of Bluetooth sensors in the ˙Istanbul Peninsula Historical area, along with their sensor numbers, is shown in Table 1.Within these data, a total of 1,293,939 different Bluetooth sensor devices read by 35 different sensor numbers (idsen) (241,230,153,267,61,225,82,84,194,256,270,275,227,173,115,229,252,187,254,171,271,272,248,226,117,244,177,170,174,113,169,207,172,175,62) were identified.The code used for this process and the output of the code are as follows (Figure 10).

Distance Matrix between Bluetooth Sensors
While creating the distance matrix for Bluetooth sensor locations, the Python-Spyde Geopy library was loaded and the geojson file was displayed in the Python script.Th Geopy.distance module, which contains the Geodesic function required for the corre use of the Geopy library and distance calculations, was used.The code (Figure 11) writte for the distance matrix obtained from the location data is as follows.The distance matrix obtained from the location data is as follows (Table 2).

Distance Matrix between Bluetooth Sensors
While creating the distance matrix for Bluetooth sensor locations, the Python-Spyder Geopy library was loaded and the geojson file was displayed in the Python script.The Geopy.distance module, which contains the Geodesic function required for the correct use of the Geopy library and distance calculations, was used.The code (Figure 11) written for the distance matrix obtained from the location data is as follows.

Distance Matrix between Bluetooth Sensors
While creating the distance matrix for Bluetooth sensor locations, the Python-Spyd Geopy library was loaded and the geojson file was displayed in the Python script.Th Geopy.distance module, which contains the Geodesic function required for the corre use of the Geopy library and distance calculations, was used.The code (Figure 11) writte for the distance matrix obtained from the location data is as follows.The distance matrix obtained from the location data is as follows (Table 2).The distance matrix obtained from the location data is as follows (Table 2).

Bluetooth Sensor Data Examination
In the Bluetooth sensor data sorted by time, the first column named idsen represents the sensor number, the second column named idaut represents the identity number of the Bluetooth device that passed by the sensor and the third column named time shows the time when the device was read by the Bluetooth sensor.
The Bluetooth sensors have a horizontal coverage of 110 • and a vertical coverage of 30 • , with a range of slightly over 500 m (according to information from the manufacturer).Data are transmitted from the Bluetooth sensors to the center in 3 min intervals.
Bluetooth sensor data from the period between 1 April 2023 and 16 April 2023 were accessed.During this time, a dataset of 8,282,609 rows was examined in Python.
When examining the data, device identities (ID) (idaut) that were recorded only once in the dataset by 35 sensors between 1 April 2023 and 16 April 2023 were identified.

Bluetooth Devices Detected Just Once in the Dataset
Considering the coverage area of the Bluetooth sensors and the time in which data are transmitted to the center:

•
Being in the Personal Devices ID Category: It is known that data are transmitted to the center every 3 min.If the device belongs to the person ID category, it cannot be read just once within 3 min.Even when the speed value is 0 km/h, it is known that the sensor would read the device multiple times while it is within the coverage area.Therefore, the device ID could have been read due to the Bluetooth device being turned on and off within the region.In this case, since there is no second read time for the ID, no speed data can be accessed.

•
Being in the Vehicle ID Category: Device IDs read just once in the region could only have been identified as belonging to a vehicle moving along the periphery of the cordon area, considering the sensor coverage area.In this case, it would be meaningless to include them in the pilot area traffic counts.
As a result, excluding Bluetooth device identity numbers (IDs) that were read just once in the entire dataset from the main dataset is a necessary step to form the sample size.The flowchart (Figure 12) created to perform this process in the Python programming language is as follows.There are a total of 687,199 identity numbers of Bluetooth devices that were read only once during the 16 days from 1 April 2023 to 16 April 2023 in the main dataset.When the devices read only once are removed from the main dataset, there are 606,740 identity numbers of Bluetooth devices remaining and the dataset consists of 7,595,410 rows (Figure 13).Future studies will be conducted these devices.There are a total of 687,199 identity numbers of Bluetooth devices that were read only once during the 16 days from 1 April 2023 to 16 April 2023 in the main dataset.When the devices read only once are removed from the main dataset, there are 606,740 identity numbers of Bluetooth devices remaining and the dataset consists of 7,595,410 rows (Figure 13).Future studies will be conducted for these devices.There are a total of 687,199 identity numbers of Bluetooth devices that were read onl once during the 16 days from 1 April 2023 to 16 April 2023 in the main dataset.When th devices read only once are removed from the main dataset, there are 606,740 identity num bers of Bluetooth devices remaining and the dataset consists of 7,595,410 rows (Figure 13) Future studies will be conducted for these devices.

Bluetooth Sensor Speed-Based Filtering Algorithm Results
To determine device speeds, both distance and time data are necessary.A code (Fig ure 4) was developed to track each device (identified by an idaut number) through th sensors.This code records which sensor the device passed through within each 1 h tim

Bluetooth Sensor Speed-Based Filtering Algorithm Results
To determine device speeds, both distance and time data are necessary.A code (Figure 4) was developed to track each device (identified by an idaut number) through the sensors.This code records which sensor the device passed through within each 1 h time interval, the first and last time the Bluetooth sensor read the device, and the duration the device remained at each sensor (in h).The output of this code is presented in Table 3.A section from the table showing the identification numbers of Bluetooth devices read by Bluetooth sensors and the passing speed of these devices through the sensors between 1 April 2023 and 16 April 2023 is presented below (Table 4).These results were obtained According to the acceptances made in the section Acceptances Based on Speed (section Rule-Based Modeling), speeds of devices for hourly periods from 1 April 2023 to 16 April 2023, were classified.At the end of this classification, device identification numbers belonging to different categories were identified.This classification was made with the help of the flow chart shown in Figure 6 (Rule-Based Modeling Algorithm According to Speed Values).A snippet from the table (Table 5) showing the time spent by vehicle ID addresses and their speeds is provided below.

Bluetooth Sensor Time-Based Filtering Algorithm Results
By converting the flow chart in Figure 7 into code in Python, the value, called the threshold value, which expresses the average time spent by Bluetooth devices between consecutive sensors, was examined for periodic time periods.Devices with remaining time values above the threshold value are in the 'Personal Devices ID' category and devices below are in the 'Vehicle ID' category.By coding Figure 8 (Finding the vehicle ID number as a result of filtering) flowchart in Python programming language, the data were parsed; as a result, 208,941 different Bluetooth device matches were captured in the vehicle ID category.The number of paired Bluetooth devices belonging to individuals was found to be 397,799.According to this filtering method using time series, 34.43% of Bluetooth devices are in the vehicle ID category, while 65.57% are in the person ID category.These data will be compared with the filtering algorithm based on speed in the next stage.

Comparison of the Modelling Results
In this study, two different filtering algorithms are considered.The first of these is the analysis made using time series.As a result of this analysis, for Bluetooth devices circulating in the region, if Equation ( 2) is correct, the rate of ID addresses belonging to people is 65.57% (397,799 personal devices ID), while the rate of ID addresses belonging to vehicles is 34.43% (208,941 vehicle ID) has happened.Another model designed is rulebased modeling based on speed.This model establishes an algorithm for determining the speed data of devices and then makes classifications for the speed data obtained.In light of these data, the rate of identity (ID) addresses of individuals was found to be 64.17%(217,348 personal devices ID), while the rate of identity (ID) addresses of vehicles was found to be 35.82%(389,392 vehicle ID).Here, the personal devices ID set includes devices with Bluetooth found on objects owned by pedestrians (pedestrian ID), device IDs with Bluetooth found on objects owned by people traveling in public transportation (P.T. ID) and residents or people in the region who cannot participate in the calculation because the speed value is 0. It is distributed into different subsets such as Bluetooth devices (L.R. ID) belonging to objects.The results are seen in Tables 6 and 7.The data accessed as a result of the applied filtering approaches appear to be close to each other in terms of numerical values.However, what is obtained in the result set are two different datasets containing ID numbers of devices with Bluetooth connection.It would not be a realistic approach to expect the IDs in both datasets to match each other exactly.Therefore, the similarities of the models need to be determined.The Jaccard index stands out as a fundamental similarity measure with wide applications in various disciplines and offers a standardized approach to measuring similarity between datasets.The Jaccard index value was calculated between two datasets found as a result of the algorithms.
The Jaccard index value (J) calculated as a result of the transactions was found to be 0.628.Considering the index value varying between 0 and 1, it can be seen that the overlap between the data is at a satisfactory level.This method can be considered as a parameter that will indirectly show the success rate of the models.

Discussion and Suggestions
In the literature, studies utilizing Bluetooth sensors for traffic measurements have primarily focused on the issue of sensor placement [26][27][28], striving to obtain accurate data rather than merely segregating the data.However, with the advancement of Bluetooth technologies, this approach has become increasingly challenging.To enhance the reliability of Bluetooth-based traffic measurements and make real-time traffic flow measurement more desirable and cost-effective, new approaches have been introduced, utilizing machine learning techniques and tree-based algorithms with Bluetooth sensors [11].In a study where modes such as walking, cycling, tram, bus, taxi, and private vehicle were categorized based on data collected via GPS from smartphones, significant features from mobility models, including speed, acceleration, and rapid acceleration, were extracted and longitudinal dynamics were applied to train the classification model.During the model validation for the classification algorithm, a decision tree that achieved high accuracy in testing was employed [29].Although our study, which involves classifying transportation modes using a classification algorithm, demonstrates performance similar to the machine learning approaches in Study [29], it differs in comparison due to the type of data and the complexity of the statistical model used.It is anticipated that with the integration of other systems, given known distributions, it will be critically important for real-time traffic monitoring applications in the future.
The biggest problem encountered while conducting the study was the inability to access data from some Bluetooth sensors or the presence of incomplete data in the case study area, due to maintenance and repair conditions.This situation, which arose due to the nature of existing Bluetooth sensors, made working with the current sensors challenging.Therefore, sensors that did not contain missing or faulty data were used within the planned timeframe.
While concentrating on the Historic Peninsula of Istanbul enables a comprehensive understanding of the city's traffic congestion dynamics, unique factors such as population density, infrastructure layout, and cultural influences may constrain the direct applicability of our results to larger or differently structured cities. Future research should address these disparities to enhance the transferability of our findings to diverse urban environments.

Suggestions
The issue of determining the location for Bluetooth sensors is another problem that needs to be addressed.Specifically, placing Bluetooth sensors in the correct locations (preferably with high vehicle traffic and low pedestrian traffic) is a step towards solving this complexity, particularly for urban traffic measurements.
With the proliferation of smart devices, the number of Bluetooth-equipped devices is increasing daily.This trend cannot be prevented.However, it is proposed to add a data-differentiating parameter for vehicle Bluetooth in the dataset; this proposal can only be implemented systemically in new-generation vehicles.
Algorithms based on statistical methods and rule-based modeling designed in the study make it easier to parse data from Bluetooth sensors.It is important to know which of the ID device identification numbers that match Bluetooth sensors belong to the vehicles to use these devices in transportation systems.The algorithms used make it easier to work with data by reducing noise in the data, but it would be useful to support Bluetooth technology with other systems at certain points to verify the data and calibrate the results.

Concluding Remarks and Future Perspectives
This paper proposes two innovative algorithms for filtering Bluetooth sensor data.The first algorithm involves a simplification process using time series based on the Simple Moving Average (SMA) and threshold models, which are tools of statistical analysis.The second algorithm employs a rule-based simplification mechanism where the data obtained from the sensors are processed and the location information of the sensors is incorporated, utilizing the speed values of the Bluetooth devices accessed.The variables of the filtering problem include the times at which Bluetooth-enabled devices matched with the sensors at specific intervals, the durations for which these devices initially and finally matched with the sensors, the distances between the sensors, the durations the devices stayed within the sensors' range, and the speed values of these devices.The application of the proposed algorithms resulted in a close similarity in the proportions of identified vehicle identity addresses (ID) (time-based filtering algorithm 34.43%; speed-based filtering algorithm 35.82%).Device identities that are common in the dataset obtained as a result of the algorithms are expressed as an intersection set.The similarity between the results of the two algorithms was calculated using the Jaccard similarity index (JI: 0.628).The statistical methods and rule-based models developed in this study significantly facilitate the segregation of data obtained from Bluetooth sensors.Identifying which device identity numbers matched with Bluetooth sensors belong to vehicles is crucial for utilizing these devices in transportation calculations.
In the subsequent phase of the research, origin-destination (O-D) matrices will be created to enable traffic planning and measurements using the obtained vehicle identity devices (vehicle ID).The filtering conducted allows for the creation of a distribution matrix when the Bluetooth sensor locations in the studied area are taken as the origin and destination points.The sequential data from Bluetooth sensors allow for tracking the routes of vehicles by observing which sensors they pass through and in what order.In future studies, the distribution ratios of devices matched by sensors in a specific region can be determined.The distribution matrix serves as a fundamental tool for any operations to be conducted in the desired region.Knowing the distribution of vehicles or persons in the

Figure 3 .
Figure 3. Finding the vehicle ID number as a result of filtering.

Figure 3 .
Figure 3. Finding the vehicle ID number as a result of filtering.

Figure 5 .Figure 5 .
Figure 5. Bluetooth device speed calculation algorithm.The flow chart prepared to find the identification numbers of Bluetooth devices read by Bluetooth sensors and the passing speed of these devices through the sensors is as follows.The results will be discussed in the next section.Rule-Based Modeling Acceptances Based on Speed  Acceptance of Pedestrian (ID) Addresses: Based on the extensive literature on pedestrian behavior in urban transportation and survey studies [19-22], an average pedestrian speed of 1.5 m/s (5.4 km/h) is chosen.Devices with speeds below 1.5 m/s are accepted as pedestrians. Acceptance as Micromobility Vehicles: The maximum speed of micromobility vehicles in Turkey is known to be 25 km/h.According to TomTom traffic data, the average congestion speeds in Istanbul during peak hours in 2023 were determined to be 27 km/h (morning) and 19 km/h (evening).Bicycle speeds range from 17 to 24 km/h,

Figure 6 .
Figure 6.Rule-based modeling algorithm according to speed values.

Figure 6 .
Figure 6.Rule-based modeling algorithm according to speed values.

Figure 9 .
Figure 9. Bluetooth sensor devices view in coordinates.

Figure 9 .
Figure 9. Bluetooth sensor devices view in coordinates.

4. 5 .
Bluetooth Sensor Data Data from the period between 1 April 2023 and 16 April 2023 were accessed.During this time, a dataset with 8,282,609 rows was examined in Python 3.11.9.Out of the 39 sensors shown in the location information, data from 35 sensors are present in the dataset.The sensor numbers that appear in the location information but are not in the dataset are 73, 96, 228 and 220.At sensor number 62, data were only read once on 4 April 2023, at 11:43 a.m.

Figure 11 .
Figure 11.Bluetooth sensor distance matrix code and output.

Figure 11 .
Figure 11.Bluetooth sensor distance matrix code and output.

Figure 11 .
Figure 11.Bluetooth sensor distance matrix code and output.

Sensors 2024 , 25 Figure 12 .
Figure 12.Identification of Bluetooth devices read just once and their removal from the main dataset.

Figure 12 .
Figure 12.Identification of Bluetooth devices read just once and their removal from the main dataset.

Figure 12 .
Figure 12.Identification of Bluetooth devices read just once and their removal from the main da taset.

Figure 13 .
Figure 13.Identification and removal of Bluetooth devices read just once from the main dataset.

Figure 13 .
Figure 13.Identification and removal of Bluetooth devices read just once from the main dataset.

Table 1 .
Distribution of Bluetooth sensors in the historical peninsula region.

Table 2 .
Distance matrix between Bluetooth sensors.

Table 3 .
Bluetooth sensor data organizing.

Table 5 .
Sample section including Bluetooth IDs separated by vehicles.