Incremental Hierarchical Clustering driven Automatic Annotations for Unifying IoT Streaming Data

. In this paper, incremental hierarchical clustering is deployed for unifying the streaming data in a hierarchical manner. SPARQL queries are used for extracting semantic annotations between the hierarchical clustered data. The agents will receive the raw data streams as input data from the IoT sensor devices and then perform the classification between the data streams for generating the RDF data patterns for the hierarchical clustering. The RDF data patterns are combined with the pre-notified metadata of the IoT sensors for the incremental hierarchical clustering process. At last, the hierarchical streaming data is annotated with the automatic semantic annotations using SPARQL queries.


I. Introduction
T HE semantic technologies address the problem of various heterogeneous devices, communication protocols, and data formats of the generated data in the Internet of Things. Annotation of IoT sensor data is the substance of IoT semantics [1]. The future generation of IoT not only deals with the physical sensor devices but also the meanings they carry with virtual representation of smart data. On an average, every day around 3.2 quintillion bytes of data are generated on the Internet. The CISCO predictions state that more than 60 billion devices will be connected to the internet by 2025, as a result zetta bytes of sensor data will be generated continuously and exponentially. The IoT sensors generated raw data is stored in the data repositories and it supports to heterogeneous smart city applications. Therefore, applying the raw data into applications may result in structural data with pre-notified format, date, source, affiliation, unit, and encryption. The next level of data is perception data that contains the multi abstraction from low-level to high-level applications to perform actionable and predictive data for the final evaluation. For understanding the perception data more concisely, the structural information is needed. Without structural information, the data may mislead to false results and may fail to integrate the realtime application data [2]. The perception data is extracted from the structured data that is more compressive and occupies less space than the raw data. Machine Learning (ML) clustering techniques are used for performing analysis on the perception data and automatic generation of semantic annotations. Moreover, in IoT, the real-time streaming data plays a major role to perform cluster analysis. The streaming data is flowing continuously as data stream from the IoT device to the peer network. The stream processing has been effectively analyzes the cluster data, improve the cluster efficiency, and able to make quicker decisions on clustered data [3]- [5].
The hierarchical clustering techniques are used for representing logical, temporal, and spatial relations on the IoT streaming data. The most important aspect of clustering IoT streaming data is its dynamic and heterogeneous nature. Therefore, a novel clustering mechanism is needed to represent the hierarchical relationships-based annotations for the IoT streaming data [6]. In this paper, incremental hierarchical clustering is deployed for unifying the streaming data in a hierarchical manner. SPARQL queries are used for extracting semantic annotations between the hierarchical clustered data. The agents will receive the raw data streams as input data from the IoT sensor devices and then perform the classification between the data streams for generating the RDF data patterns for the hierarchical clustering. The RDF data patterns are combined with the pre-notified metadata of the IoT sensors for the incremental hierarchical clustering process. At last, the hierarchical streaming data is annotated with the automatic semantic annotations using SPARQL queries. Semantic annotation has mainly taken from the field of text annotation. It provides machine-readable descriptions along with labels for URIs. Dealing with IoT semantic data is a difficult and challenging task for researchers and developers with technical issues. To solve this problem on providing the manual annotation and semi-automatic annotation, one approach for providing a semantic annotation to IoT semantic data is proposed [7]- [8]. Using manual annotation and semi-automatic annotation cannot be applicable if the IoT sensor data is huge in volume. It consumes more time to annotate the huge data and unable to capture the IoT devices generated data [9]. Therefore, a new and innovative automatic semantic annotation with more efficient mechanisms are needed.
The main contributions of this work are listed as follows: Firstly, build an architectural model using hierarchical clustering driven automatic annotation for unifying IoT streaming data. Thereafter, add semantic annotations using SPARQL queries. Then extract and visualize the streaming data using the proposed IHC-AA-IoTSD mechanism and SPARQL queries. Afterwards, find the performance evaluation of the proposed model. Finally, comparison has been made of the proposed architectural model with existing approaches.
The remainder of this paper is described as follows: section II discuss background of the related work and the state of the art schemes. In section III, the authors discuss the proposed mechanism Incremental Hierarchical Clustering based Automatic Annotation for IoT Streaming data (IHC-AA-IoTSD). Experimental Methodology and Evaluation are described in section IV and section V respectively. Finally, section VI concludes this work along with the future scope.

II. Background and Related Work
In this section, the related work of semantic annotations in IoT platforms for unifying streaming data in efficient way is discussed. Majority of the researchers has put their efforts on how to deal with big volume and variety of data generated by IoT devices. As a result, ontologies and standards, mapping technologies and exchange systems, semantic annotations, data integration, interoperability, scalability, cluster efficiency and energy-efficient issues are identified. In semantic annotations, manual and semi-automatic annotations are time consuming and perform the annotation process with labels and manually. In addition, these all are dealing with web documents, text documents, and sensor networks. While thinking of Cyber-Physical Systems (CPS) and IoT dynamic data, it generates the big volume of data, therefore, it requires automatic annotation for handling large dynamic data.
Annotation is the process of adding additional information to the existing data, which is enriched with labels, keywords, things, etc. Semantic annotation is the term of enriching data with meanings and descriptions. Annotation plays a major role by providing semantics between humans and machines. These are categorized as three ways-Manual annotation, Semi-automatic annotation, and Automatic annotation. In manual annotation, the data is annotated manually. Here keywords are used for annotating the additional information with existing data. Humans with their self-imagination annotate the keywords. Therefore, it yields the highest accuracy, but it consumes more time to complete the entire triple data. In semi-automatic type of annotation, some part is carried out with keywords and the rest of the part is finished with trained pre-defined set automatically [21]. Two steps complete this process. In the first step, the annotator can annotate the data with keywords. In the second step, the semantic annotation tools are used to toggle the data. Both accuracy and efficiency are improved in this type of annotation system. Automatic annotation is the advanced and recently used annotation system by developers and researchers. In this, the whole process is measured by the annotation system. Annotation tools like Gruff (https://franz.com/agraph/gruff/), Jena (http://jena.apache.org/), and Protégé (https://protege.stanford. edu/) are used for this approach. Based on the instructions given by a user, the annotation tool will place corresponding predicates among the subject and object. At last, a meaningful label and property are assigned to it.
The existing research work on semantic annotation, majority of researcher's intention have been attentive on the semantic based Web documents, and a few researches pay attention to the IoT streaming data based automatic semantic annotation. As shown in Table I, the authors has been associated the former semantic annotation methods in seven aspects, such as "Automatic Annotation", "Semi-Automatic Annotation", "Manual Annotation", "Training Data Set", "Application Specific Domain", "Data Type based on" and "Model/Framework/ Technology used". In Table I, the authors has been deliberate based seven aspect and it indicates the following: • Supreme of the annotation methods focus on the Internet field and are applied for Web documents.
• The existing research of semantics for Web documents primarily pay attention towards Ontology based annotation methods.
• Majority of the existing works on semantic annotation methods in the IoT data are manual. Furthermore, they primarily focus on architectural models and deployable frameworks.
Nowadays, the methods compared in Table I are the most powerful and popular mechanisms to achieve semantic data integration in IoT platforms. The existing data models are updates with semantic annotations on providing semantic labels to become model elements. Kolozali et al. [23] proposed SensorSAX and SAX (Symbolic Aggregate Approximation) methods for adaptive and non-adaptive window size segmentation of data streams real-time processing. Their algorithms are efficient in improving data aggregation in streaming data. However, these are unfair while annotating the IoT dynamic data. Mazayev et al. [24] proposed a CoRE framework for data integration and profiling of objects, as a result, it facilitates semantic data annotation, validation of results, and reasoning of annotated data. This framework adopted the RESTful resources for validating the user profiling of objects with the COAP server. However, the proposed framework is limited for validating and annotating IoT dynamic data efficiently. Mayer et al. [25] developed an Open Semantic Framework (OSF) for industrial IoT applications to make the web of things into semantic web of things. This framework is widely designed to enable the industrial things with semantics to the IoT domains. However, the OSF is not implemented under consideration of various industrial applications.
Shi et al. [26] concentrated on data semantization in IoT applications. They reviewed and overviewed all architectural elements and applications supported for IoT domain. In addition, they surveyed on how to add semantics to the IoT dynamic data, discussed on current research issues and challenges faced by semantic scholars. However, they limited to perform analysis on IoT data integration techniques. Zamil et al. [27] have proposed automatic data annotation techniques for smart home environments by adopting temporal relations. In addition, they incorporated HMM and Random Field models for integrating temporal and spatial relations enhanced by detection accuracy rate. The produced results are moderate and there is a space for enhancement with other incremental clustering techniques. Moutinho et al. [28] have extended the semantic annotations for integrating XML-messages using generating translators under the domain of arrowhead framework. These annotations are not automatic and only domain specific. Therefore, it consumes more time and space for annotating the IoT dynamic data.
An exhaustive and optimistic survey has been conducted under the literature survey. Nevertheless, these all do not light the prerequisites of the semantic scholars and users for adding automatic semantic annotations in IoT streaming data. Therefore, in this paper, we present IHC-AA-IoTSD mechanism using SPARQL queries to improve the clustering based annotating process in IoT sensor streaming data. Through a unification of machine learning and semantic technologies, the proposed approach gives better results in terms of efficiency, reliability, scalability and security compared to the state of the art schemes.

III. Proposed Mechanism
In this proposed research work, to achieve semantic annotations among data samples, a Resource Description Framework (RDF) is used to annotate the data objects in meaningful way. The authors have analysed an incremental hierarchical clustering driven automatic annotation architectural model based on IoT for unifying the streaming data. For this reason, in this paper, a new and novel IHC-AA-IoTSD mechanism is proposed for annotating the streaming data semantically. The Fig. 1 shows an overview of the simplified architectural model of the proposed work. At first, the data generated from IoT sensors are collected from IoT sensor data world. On identification of the sensor data, then the agents will classify and analyze the data. The IoT streaming data generated from the data repository section; firstly, to interpret the objects in the streaming data, the RDF framework is used. Secondly, to abstract the data from the triple store, SPARQL queries is required. The SPARQL Query Engine mainly consists of three subcomponents. Those are Query Parser (QP), Query Optimizer (QO), and Query Processor (QP). The Query Parser (QP) is used for generating the triple patterns in a sequential manner. With the use of the Query Optimizer (QO), the SPARQL queries are optimized and processed. This task is accomplished before it goes to the next component called Query Processor (QP). The SPARQL Query Engine depicts the overall picture and model of the proposed approach. Each component workflow descriptions discussed as follows:

A. Query Parser
This is the first subcomponent of the Query Execution Engine. This subcomponent finds the input healthcare related SPARQL queries from users, abstracts subsequent resources for the consequent subcomponent named as Query Optimizer (QO) and produces a node list for the Query Processor (QP). In this work, we used only basic SPARQL queries with simple SELECT and WHERE clauses. The proposed approach also supports other clauses, such as ORDERBY, GROUP BY, and FILTER.

B. Query Optimizer
This is the second subcomponent and generates a Query Execution Plan (QEP) for the SPARQL query. The processing of queries is optimized by assessing the input query patterns in a meaningful way. The query triple patterns are arranged in a hierarchical manner for finding matching value result of a triple pattern function for the subsequent triple pattern in the query execution plan.

C. Query Processor
In query processor subcomponent, the matching value results are found, verified with triple patterns and finally combined for answering the full query result. The validity of the triple patterns and input queries are arranged in a hierarchical and topological order. Then the intermediate mismatched patterns are reduced. Table II shows all the symbols or notations used in this paper. The Algorithm 1 is used to translate the given label of information into a RDF label using TranslateLabel() function. The input is taken as n number of triples and correspondingly specify each type of label and the processing triple time t is measured. The annotated <label> of RDF data is the output. Firstly, it collects the various types of data from IoT devices, to store the triples data starting from 1 to n as decision iterator.
If the condition p (t, x c ) ≤ 1 is satisfied then extract every row and label whichever is matched. At last, the list L has to be returned.

Algorithm 1: TranslateLabel () translates the given label into RDF label
Input: Number of triples denoted as n; type of each data item (type); t is the processing time.
Output: List out the annotated (<label>) from RDF.
1: First collect the various types of data generated from IoT sensors 2: for i: =1 to n // i is the decision iterator

3.
if p (t, x c ) ≤ 1 as per the Eq. 3

4.
then extract every row and label 5.
add the matched label 6.
close the each completed label 7.
end if 9. end for 10. Finally, return the list L.
The Algorithm 2 is used to generate the given RDF label of information into triples using GenTriple() function. The input is taken as n number of triples value and correspondingly specify each type of label and subsequent time t is measured. The annotated <label> of RDF data is the output. Firstly, it collects the various types of data from IoT devices, to store the triples data starting from 1 to n as decision iterator. If the condition label[i].isElement() = 1 is satisfied then it extracts each label and annotates it as a triple, then it allocates the unique id for the namely added resource whichever is matched. Finally, the list L with RDF triples is returned.

Algorithm 2: For the generation of triples GenTriple() from a given RDF label
Input: Number of triples named as n; value measured; type of each data item (type); t is the measured time. Output: List out the annotated (<label>) from RDF. 1. First, expand each label 2. for i: = 1 to n do // i is the decision iterator 3.
if label [i].isElement() = 1 // return 1 when the current label is matched element 4. extract every label and annotate it as triple 5.
then allocate the unique id for namely added resource 6.
close the completed triple list 8.
end if 9. end for

Algorithm 3: IoT sensor data into annotated RDF data transformation
Input: Dataset to annotate; type of each data item (type) Output: The annotated (<label>) is transformed into a reduced triple format from source data. In Algorithm 3, the IoT sensor data into annotated RDF data transformation is shown. The input is taken as a dataset to annotate and specifies each data item type. The annotated (<label>) is transformed into a reduced triple format from source data. Firstly, it collects various type of sensor data. It repeats this process until the last triple item is matched. Then it annotates the List L: GenTriple (TranslateLabel()). Thereafter it extracts every label and annotates it as a triple. It allocates the unique id for the newly added resource.

D. Incremental Hierarchical Clustering driven Automatic Annotation Process
The agents will play a key role to place the classification of data in time basis by using the matching mechanism for grouping each instance resources occurrence.
The matching objects are denoted as m of the RDF data and current capture objects as C. the matching m ϵ RDF instances as shown in Eq. 3.1 and current capture C as shown in Eq. 3.2, at t n ϵ time interval. X m and X cc form the instances ( ) with the corresponding time interval range t 1 ≤ t 2 ≤ t 3 ……≤t n , for each individual i.
Here j starts from 1 to n.
For pattern recognition of data, let us take a resource r that should be any category of data d. The scoring function S d is used to calculate the matching pattern data d at a particular time T. The individual match value is x at time period t < T and is defined using Eq. (3.3).

(3.3)
In order to generate the hierarchical clustering driven tree of the IoT streaming data, the problem is formulated as follows: the input of the sensor raw data is classified with agents and represented as data streams DS = {ds 1 , ds 2 , ds 3 ,….., ds n } in the D dimensional space, the prenotified meta data as the k dimensions {x 1 ,…., x k }, the pre-clustered streaming data values as {x k+1 ,…., x l }, and to measure the cluster distance among the data patterns as dist(cl 1 , cl 2 ). At starting, each classified data is assigned to its own cluster. Each data pattern in DS and the cluster cl i = {ds i }, CL = {cl 1 ,…..,cl n } are selected for measuring the minimum distance between data object. Then the merge operation is performed until the none of the cluster can be left blank or empty. . Whereas dist (ds 1i , ds 2j ) may be calculated using any of the mahalanobis distance, Euclidean distance, or Minkowski distance function in the D dimensional space. The same procedure is performed until the semantic annotations are extracted from the hierarchical tree by cutting into horizontally or vertically and adding the data streams in incremental manner.
The following list of steps are required to design an incremental hierarchical clustering driven automatic annotations for unifying IoT streaming data.
The input data streams DS = {ds 1 , ds 2 , ds 3 , ….., ds n } are obtained from the IoT sensor data repositories in the D dimensional space.
The incremental hierarchical clustering based nearest neighbor chain is used for clustering streaming data.
It starts with any node S in the hierarchical tree, elaborates it until a RNN (Reciprocal Nearest Neighbor) pair of data samples, and then agglomerates these data samples.
Continue the same process with the hierarchical tree of the previously annotated objects using RNN.
The RNN of object p and q, where object q must satisfy the condition Thereafter, the clustering distance dist(p, q) using the Euclidean similarity distance measuring function is measured.
The establishment of the linkage or distance between clusters of the hierarchical tree is done using wards method where is the center of the cluster e and n e is the number of data samples involved in it.
Finally, the semantic representations between the clustered hierarchical trees are annotated with SPARQL queries.
In the Resource Description Framework (RDF), the data is generally warehoused as a combination of statements in triples format as {Subject Sb, Predicate Pr, Object Obj}, which is similar to an entity representation in DBMS as {entity e, property p, value v}. Subjects and predicates stored in triples are URIs when objects can be either Uniform Resource Identifiers (URIs) or literal values. SPARQL is a Simple Protocol and RDF Query Language is used for retrieving data stored in RDF repositories. Its syntax is similar to SQL; thus it contains two main clauses, e.g., SELECT and WHERE. The SELECT clause identifies the statements as triples that will appear in the query results.
The WHERE clause provides the basic graph pattern to match against the data graph. We consider four disjoint sets Var (variables), Uri (URIs), Blnk (blank nodes) and Ltr (literals).
Almost every SPARQL query contains a set of triple patterns called a basic graph pattern. A basic graph pattern, BGP, is a finite set of triple patterns {tp1, tp2, tp3.…,tpn}, in which each tp is a triple as shown in Eq. (3.4).

(3.4)
The sequence of the patterns is framed with different combinations of the triples as shown in Eq. (3.5) The Query Execution Plan (QEP) is measured based on the sequence of patterns generated by triples (tp 1 , tp 2 , tp 3 , ... tp n ) as long as the long sequence patterns are generated. Such that there is at least one common medium of sequence patterns among tp 1 and tp i+1 from (Subject as S, Object as O, and Predicate as P) being selected, and it follows any one of the patterns as shown in Eq. (3.6-3.8). In order to develop the query execution plan, the query in Algorithm 4 is processed and stored in the triple store (ts). The loaded query is mapped with each triple pattern to subsequent nodes. Sometimes, it refers to other triple patterns that are matched with the stored triple values i.e. (node, [adjacent_TL]). The subject, predicate, and object manner are matched by applying the SPARQL query; finally, the corresponding RDF graph is generated. In Algorithm 4, the query execution plan is shown. The motivation behind this query execution plan is engaging the ordered triple patterns to indexed RDF data and improved version of ordered triple patterns to process the queries in an efficient manner. After generating an ordered triple pattern list for the execution plan, the residual sequential triple patterns that are in the triple list are not attached to the current execution plan. The Tp is considered as the triple pattern in QEP. The nextN <= get_nextN (Tp, sN) is placed subject as first node and object as the second node and vice versa generated. The subP is the sub plan used for storing the remaining triple pattern part for QEP. The appended data triple pattern is a subset of adjacent triple list and TpL is the not visited list then create an intermediator plan for annotating current pattern objects. Such that, consider this one as the current query evaluation plan for executing queries. Finally, the next triple and triple pattern are merged with adjacent triple list, evaluated with QEP.

16: return QEP
The following are the list of steps required to execute the query plan.
1. Firstly, go to the file menu, in that select new triple storeappropriate path name that has been given for storing work in the triple store.
2. Once the path is identified by triple store, a maximum number of estimated triples are selected. (E.g. 100000).
4. Go to the Query view in View menu bar. The required query is applied for annotating the data in RDF format.

5.
Then run the SPARQL query, it shows the result as ?s ?p ?o in tabular form.
6. Finally, click on the create visual graph icon, then it generates the annotated RDF graph. Make changes on the graph as per the neediness of the user.
In the Query Execution Plan (QEP), the subject, predicate, and object are placed in triple patterns format. Therefore, at any point, only the vertices and edges can be placed. The time complexity for generating the query execution plan is O(|S|).|P|). Here, |S| is the number of Subjects placed in the healthcare dataset, and |P| is the number of Predicates placed in the healthcare dataset. Therefore, the proposed algorithm 4 and algorithm 5 take total computational time O(|S|) + |P|) as the time complexity, because this work uses every Subject and Predicate or node only once.

IV. Experimental Methodolgy
In this section, the proposed mechanism is described with automatic annotations for unifying the IoT streaming data. In addition, the implementation results of the proposed mechanism with SPARQL queries processing is discussed.

A. Adding Semantic Annotations to the IoT Streaming Data
In this proposed research work, to achieve semantic annotations, the query processing mechanism and triple patterns are used. The authors have analyzed and tested on three different healthcare datasets using incremental hierarchical clustering driven automatic annotations based on IoT streaming data.

B. Query Processing
In this section, the queries are processed and executed based on the query execution plan so Algorithm 4 is used. To perform that operation we need to observe the correct triple patterns from the triple store or find the invalid annotation results. The following common steps used in Algorithm 5 using a triple store are followed: 1. Firstly, the input cN is considered as the common node or node pattern to retrieve the subsequent triple pattern (tp) from query execution plan and generate the annotation results from triple store to the common node cN.
2. The matching common node cN is attached to the common matching list cML.
3. For each common annotated data is resulted to merge the annotator list of final matching list.
4. Then, each matching value is a subset of the final matching list and contains the annotated attributes for matching value identification.
5. The next value is placed on the basis to get the next node and add the mapping value to the triple store.
6. If any node is mapped with the next node then the mapping annotations consisting of next node, matching value, and next node matching list are added. If any node is not matched with the next node then all corresponding matching values and associated annotations are removed.
7. This entire process is repeated until the all-existing triples are reached and mapping of the common node cN exists that was taken from triple pattern tp.
The SPARQL queries are processed for annotating the matching healthcare data and its associated values. However, the queries are different triple patterns (tp 1 , tp 2 , ...., tp n ) and the matching subject of any common node is retrieved as well as its corresponding predicates. The Fig. 2 is a sample SPARQL query for annotation of triples such as subject, predicate, and object manner. The SPARQL queries are widely used for annotating the RDF data for machine-readable and semantically describable data. There is another option in SPARQL queries to extract the full dataset attribute information with a limit basis like 10k, 20k, 30k, 100k, and so on triples.

V. Performance Evaluation
This section employs the experimental datasets used for the proposed IHC-AA-IoTSD mechanism. In addition, the performance evaluation metrics are discussed for evaluating the performance of the IHC-AA-IoTSD in detail. In the final analysis, the time complexity of the proposed algorithms are measured.

A. Data Setup
For evaluation of the proposed mechanism IHC-AA-IoTSD, three different kinds of healthcare datasets, namely Heart diseases, Heart attack, and Diabetes are taken. These are openly available datasets from the UCI Machine learning repository. Table III shows the dataset details including names of datasets, the number of triples in the datasets, and downloadable resources information.

B. Experimental Environment
To evaluate the performance proposed mechanism, a conventional and regular laptop was used with the configuration of Windows 10 Home 64-bit, 8 GB RAM, 1 TB HDD, 2 cores, 2.2 GHz CPU clock speed, and Intel® Core ™ i7-8 th Gen-8750H CPU type. The Gruff tool with Java 1.8.0 platform was used to experiment the healthcare data. The Tableau and Allegro Graph tools support to visualize the data in a good manner for users. The SPARQL query language was used for annotating the healthcare data to communicate patient and doctors in a meaningful way.

C. Performance Metrics
To evaluate the performance of the proposed framework, the following metrics are considered for measuring the framework. These metrics are generated from the confusion matrix as shown in Table. IV.

Predicted as "YES"
Predicted as "NO"

Actually as "YES"
True Positive False Negative

Actually as "NO"
False Positive True Negative • True Positive Cl ant → Cl ant : This is an assessment of correctly clustered annotations considered correctly as clustered annotations.
• True Negative NCl ant → NCl ant : This is an assessment of non-clustered annotations considered correctly as non-clustered annotations. (4.1) • False Positive NCl ant → Cl ant : This is an assessment of non-clustered annotations considered incorrectly as clustered annotations.
• False Negative Cl ant → NCl ant : This is an assessment of clustered annotations considered incorrectly as non-clustered annotations.

True Positive Rate (TPR)
TPR states the sensitivity value and measures correctly clustered annotations from the dataset as shown Eq. (4.1). Eq. (4.2) corresponds to the true negative rate (TNR).

False Positive Rate (FPR)
FPR measures the significance level, which scales the proportion of non-clustered annotations that are interpreted as clustered annotations in the automatic annotation process, and generated as input dataset sequence as shown Eq. (4.3).

Accuracy
Accuracy is the first step towards performance measure where it defines the ratio between the total counts of correct clustered annotations made to a total count of clustered annotations made as shown Eq. (4.5). (4.5)

Precision, Recall & F-measure
Precision discourses about the exactness of the clustered data, and the Recall voices about completeness of the data. The Precision and Recall discuss more about the detected accuracy of the data, and the accuracy should not deal much about false results. The F-measure is the mean of precision and recall. The equations depicted from (4.6) to (4.8) is Precision, Recall, and F-measure respectively. These ML metrics are used on the proposed mechanism for improving cluster efficiency and unifying the IoT streaming data.

D. Experimental Results and Discussions
This experiment is conducted under the stimulus of three healthcare datasets namely-Heart Diseases, Heart Attack, and Diabetes by applying various triple sizes with 10k, 20k, 30k, 40k, 50k, and 100k respectively. Annotating the objects of streaming data, six SPARQL queries are used to evaluate the proposed IHC-AA-IoTSD mechanism as represented in Fig.3 to Fig.10. The SPARQL query 1 shown in Fig. 3, queries for the drug types and values annotated with hierarchical clustered data. The SPARQL query 2 shown in Fig. 4 is used to extract unique heart attack attributes and their values from heart attack dataset. The role of SPARQL queries is highly enrich to all attributes for annotation. In addition, the queries are effectively annotated various attributes in lower execution time. The SPARQL query 3 is as shown in Fig. 5 and its resultant RDF graph as shown in Fig. 7. The hierarchical tree based predicates are annotated over the various triple data objects.

Article in Press
The SPARQL query 4 as shown in Fig. 6. The Fig. 8 shows the resulted output of query 4. Moreover, the diabetes data set contains of predicates as row id, value, and number of the deaths, etc. Using annotation process, the representation of the year wise death rates have been enriched as well as extracted. The SPARQL query 5 shown in Fig. 9 has been performed on heart diseases dataset for annotating the healthcare records by means of subject, predicate, and object manner. It indicates that the annotations performed on the whole dataset with accurate annotations. The SPARQL query 6 shown in Fig. 10 is widely used for annotating the heart diseases data on value and predicate basis annotations. In this, the corresponding predicate as the number of national payments on year wise, payment for heart diseases, measure id, measure name, measure start date, measure end date, type and corresponding values are annotated. The SPARQL query 5 and query 6 are used in this paper to annotate the healthcare data by varying triple data size up to 100k triples. These results have not been presented because these annotations make the things complex and not visible to the users. However, the results of SPARQL query 1 to query 6 clearly indicate that automatic annotations are more concisely preferable than the manual and semi-automatic annotations. Because in automatic semantic annotations, the trained and classified data are labelled using an automated annotation system. The average execution time of the various queries are measured, and it achieves the lowest compared with ATLAS [22], FBASAM [11], and OBSAA [19] approaches.
The first experimental investigation of IHC-AA-IoTSD is validated through TPR by applying various triples with respect to a stable FPR 10, 20, 30 and 40% over the benchmark mechanisms such as ATLAS, FBASAM, and OBSAA is observed in Figs. 11, 12, and 13 respectively. Fig.11 (a-d) shows the leading TPR value on Heart Diseases dataset of proposed IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA with respect to the stable FPR of 10%, 20%, 30%, and 40% respectively. Fig. 11 (a) result proves that IHC-AA-IoTSD is capable to preserve the TPR around 0.95 at dynamically allocated triples and this TPR value infers 12% success rate than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively under 10% FPR. Fig. 11 (b) shows the dominant TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA respectively under 20% FPR and is capable to maintain its TPR value around 0.92 at dynamically allocated triples even the FPR is increased. In addition, the proposed IHC-AA-IoTSD proves a greater TPR around 13% than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Likewise, Fig. 11 (c) represent the TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA under 30% FPR and is capable to withstand its TPR value around 0.9 at various dynamically allocated triples and proves a greater TPR around 11% than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Similarly, Fig. 11 (d) represent the TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA respectively under 40% FPR. Besides, proposed IHC-AA-IoTSD is achieved a marginable TPR around 0.88 at dynamically allocated triples and proves this TPR value infers 8% higher accurate than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Fig.12 (a-d) shows the dominant TPR value on Heart Attack dataset of proposed IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA with respect to the stable FPR of 10%, 20%, 30%, and 40% respectively. Fig. 12 (a) result proves that IHC-AA-IoTSD is capable to preserve the TPR around 0.92 at dynamically allocated triples and this TPR value infers 12% success rate than the benchmark  mechanisms ATLAS, FBASAM, and OBSAA respectively under 10% FPR. Fig. 12 (b) shows the dominant TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA respectively under 20% FPR and is capable to maintain its TPR value around 0.9 at dynamically allocated triples even the FPR is increased. In addition, the proposed IHC-AA-IoTSD proves a greater TPR around 13% than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Likewise, Fig. 12 (c) represent the TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA under 30% FPR and is capable to withstand its TPR value around 0.88 at various dynamically allocated triples and proves a greater TPR around 11% than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Similarly, Fig. 12 (d) represent the TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA respectively under 40% FPR. Besides, proposed IHC-AA-IoTSD is achieved a marginable TPR around 0.86 at dynamically allocated triples and proves this TPR value infers 7% higher accurate than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Fig.13 (a-d) shows the dominant TPR value on Heart Diseases dataset of proposed IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA with respect to the stable FPR of 10%, 20%, 30%, and 40% respectively. Fig. 13 (a) result proves that IHC-AA-IoTSD is capable to preserve the TPR around 0.94 at dynamically allocated triples and this TPR value infers 13% success rate than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively under 10% FPR. Fig. 13 (b) shows the dominant TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA respectively under 20% FPR and is capable to maintain its TPR value around 0.92 at dynamically allocated triples even the FPR is increased. In addition, the proposed IHC-AA-IoTSD proves a greater TPR around 12% than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Likewise, Fig. 13 (c) represent the TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA under 30% FPR and is capable to withstand its TPR value around 0.9 at various dynamically allocated triples and proves a greater TPR around 11% than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively. Similarly, Fig. 13 (d) represent the TPR value of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA respectively under 40% FPR. Besides, proposed IHC-AA-IoTSD is achieved a marginable TPR around 0.87 at dynamically allocated triples and proves this TPR value infers 9% higher accurate than the benchmark mechanisms ATLAS, FBASAM, and OBSAA respectively.
In the second experimental investigation of IHC-AA-IoTSD validated through the detection accuracy, TNR, FNR, TPR, Precision, and FPR over the benchmark mechanisms such as ATLAS, FBASAM, and OBSAA techniques respectively. Fig. 14 (a) represents the average detection accuracy of IHC-AA-IoTSD on three healthcare datasets with various triple sizes. The results confirm that IHC-AA-IoTSD is capable to accomplish superior detection accuracy in heart dataset from the UCI data repository, and it acquired detection accuracy of 9-94% from 10k triples to 100k triples respectively. Nevertheless, ATLAS facilitates a detection accuracy of 7-90% from 10k triples to 100k triples respectively, FBASAM achieves a detection accuracy of 5-81% from 10k triples to 100k triples respectively and OBSAA ensures a detection rate of 2-75% from 10k triples to 100k triples respectively. Performing tests on heart attack dataset from the kaggle data repository, it got a detection accuracy of 11-97% from 10k triples to 100k triples respectively. Nevertheless, ATLAS facilitates a detection accuracy of 9-93% from 10k triples to 100k triples respectively, FBASAM achieves a detection accuracy of 7-89% from 10k triples to 100k triples respectively and OBSAA ensures a detection rate of 4-84% from 10k triples to 100k triples respectively. Performing tests on diabetes dataset from the UCI data repository, it got a detection accuracy of 10-96% from 10k triples to 100k triples respectively. Nevertheless, ATLAS facilitates a detection accuracy of 7-90% from 10k triples to 100k triples respectively, FBASAM achieves a detection accuracy of 5-84% from 10k triples to 100k triples respectively and OBSAA ensures a detection rate of 3-80% from 10k triples to 100k triples respectively. On an average IHC-AA-IoTSD got the 4% detection accuracy increases from ATLAS mechanism at 10k triples whereas at 100k triples got the same improvement. After combining the three healthcare dataset results with increased detection accuracy, the results indicating that 2%, 4%, and 7% decrease than the ATLAS, FBASAM, and OBSAA techniques respectively. This effectiveness of IHC-AA-IoTSD by means of detection accuracy is primarily payable to the enhanced process of multi-agent based semantic annotation used for classifying and testing. This detection accuracy is also because of the agent-based automatic semantic process stimulated in the IHC-AA-IoTSD annotation mechanism.  the TNR value by 15-23% differing to ATLAS, FBASAM, and OBSAA, which enable an improvement of 2%, 5%, and 10% from 10k triples to 100k triples. The results about the enhancement of TNR value prove that the IHC-AA-IoTSD performs better because of the patient and doctor annotating the healthcare data enabled in the detection process.
Fig.14 (c) depicts the reduced FNR of IHC-AA-IoTSD under changing triple rate and ensures that, it can minimize the FNR of about 16-30%, which is hardly 8% decrease at 10k triple size and 12% decrease at 100k triple size than ATLAS framework testing on heart diseases dataset. Testing on heart attack dataset, it can minimize the false negative rate of about 17-29%, which is nearly 8% decrease at 10k triple size and 8% decrease at 100k triple size than ATLAS framework. Similarly, by testing on diabetes dataset, it can minimize the false negative rate of about 18-29%, which is nearly 8% decrease at 10k triple size and 12% decrease at 100k triple size than ATLAS approach. The results depict that, the decrease in false positive rate at 10k triple size is nearly 8, 14, and 16% testing on Heart diseases dataset, nearly 8, 12, and 22% testing on Heart Attack dataset, and nearly 6, 14, 22% testing on Diabetes dataset than the ATLAS, FBASAM, and OBSAA techniques respectively. After combining the three healthcare dataset results with reduced False Negative Rate (FNR), the results indicate a 7%, 13%, and 18% decrease compared to the ATLAS, FBASAM, and OBSAA techniques respectively.
Fig. 14 (d) represents the TPR value of IHC-AA-IoTSD under changing triple rate and the result evidences its capacity of enhancing the TNR value by 34-23%, which is nearly 5, 9 and 11% higher than the TNR obtained by ATLAS, FBASAM, and OBSAA tested on three healthcare datasets at 10k triples. The results are considered on an average and it achieves the 5, 9, and 11% more than the TPR value achieved by ATLAS, FBASAM, and OBSAA. The importance of IHC-AA-IoTSD is based on agent preprocessing mechanism used for annotating the triple data and SPARQL queries that could be optimally applicable for healthcare data annotations. Fig. 14 (e) represents the Precision rate of IHC-AA-IoTSD under varying data triple sizes at 10k, 20k, 30k, 40k, 50k, and 100k on 3 different datasets. The result evidences by enhancing the Precision value around 5-21%, which is nearly 5, 10 and 17% higher than the Precision rate simplified by ATLAS, FBASAM, and OBSAA testing on Heart Diseases dataset using 10k triples. Similarly, nearly 6, 10 and 15% higher than the Precision rate simplified by ATLAS, FBASAM, and OBSAA testing on Heart Attack dataset at 10k triples, and nearly 7, 12 and 21% higher than the Precision rate obtained by ATLAS, FBASAM, and OBSAA testing on Diabetes dataset using 10k triples. After combining the three healthcare dataset results with increased Precision rate, the results indicate that around 6%, 11%, and 17% increase than the ATLAS, FBASAM, and OBSAA techniques respectively.
Fig.14 (f) depicts the reduced FPR of IHC-AA-IoTSD under varying data triples and ensures that, it can minimize the FPR around 6-24%, which is nearly 3% decrease at 10k triple size and 6% decrease at 100k triple size compared to the ATLAS framework, testing on heart diseases dataset. Testing on heart attack dataset, it can reduces the FNR about 8-29%, which is nearly 5% decrease at 10k triple size and 8% decrease at 100k triple size compared to the ATLAS framework. Similarly, by testing on diabetes dataset, it can minimize the false negative rate of about 5-25%, which is nearly 7% decrease at 10k triple size and 11% decrease at 100k triple size than ATLAS approach. The results depict that, the decrease in false positive rate at 10k triple size is nearly 3%, 6%, and 10% testing on Heart diseases dataset, nearly 5%, 8%, and 14% testing on Heart Attack dataset, and nearly 7%, 11%, and 16% testing on Diabetes dataset compared to the ATLAS, FBASAM, and OBSAA techniques respectively. After combining the three healthcare dataset results with reduced False Positive Rate (FPR), the results indicate 5%, 8%, and 13% decrease compared to the ATLAS, FBASAM, and OBSAA techniques, respectively.
In the third experimental investigation of IHC-AA-IoTSD validated through the Average Execution Time of various queries over the benchmark mechanisms such as ATLAS, FBASAM, and OBSAA techniques respectively. Fig. 15 (a-d) shows measured average execution time by various queries from Q1 to Q6 at 10k triples, 20k triples, 30k triples, and 50k triples, respectively. The result proves that IHC-AA-IoTSD is able to maintain the Average Execution Time of 27 ms at various queries and this Average Execution Time infers 12% success rate higher than ATLAS, FBASAM, and OBSAA. Figs. 15 (a-d) highlights the predominance Average Execution Time of IHC-AA-IoTSD over ATLAS, FBASAM, and OBSAA under the 10k triples, 20k triples, 30k triples, and 50k triples respectively. The result confirms that IHC-AA-IoTSD is able to endure its Average Execution Time of 86 ms at various queries even when the triple size is increased. IHC-AA-IoTSD enables a superior Average Execution Time of 16% when compared to ATLAS, FBASAM, and OBSAA with all the queries.

E. Complexity Analysis
Moreover, the time complexity of IHC-AA-IoTSD scheme, which used algorithms from 1 to 3, is determined to be T(n) for algorithm perceived instances staring from j is 1 to n and i value between 1 to 9. The time complexity of algorithm 1 is calculated by T 1 (n), algorithm 2 is by T 2 (n), and algorithm 3 is by T 3 (n). At last, these three times complexities will be combined to get the overall time complexity T(n). Let us see how to find the time complexity of T 1 (n), it is as follows in Eq. (5.1).
(5.1) Similarly, the time complexity is generated for algorithm 2 as follows in Eq. (5.2).

VI. Conclusion and Future Work
In the IoT streaming data era, the sensor devices are generating dynamic data continuously, which is heterogeneous. The IoT data also consists of the real-time streaming data. To perform analysis and annotating the streaming data is a current research problem faced by researchers. Therefore, in this paper, the authors proposed IHC-AA-IoTSD mechanism for unifying the hierarchical clustered data using SPARQL queries. The experimental investigation of IHC-AA-IoTSD has been conducted on three popular healthcare datasets by varying triple data and measuring detection accuracy, precision, TPR, TNR, FPR, and FNR. In the first experimental investigation, the TPR value has been measured under the streaming of triples with stable FPR diverse with 10, 20, 30 and 40%, respectively. In the second experimental investigation, the average results have been taken for an account and proves that the IHC-AA-IoTSD outperforms compared to benchmark mechanisms such as ATLAS, FBASAM, and OBSAA. In the third experimental investigation, the query average execution time has been calculated by taking six different queries under 10k, 20k, 30k, and 50k triples. Considering that IoT streaming data is dynamic and heterogeneous, the proposed mechanism overwhelmed by efficiently annotating the hierarchical clustered data. Moreover, the proposed IHC-AA-IoTSD mechanism outperforms compared to the existing state of the art schemes. In future, the proposed mechanism can be optimized by considering the hash table (key, value pair) for storing SPARQL queries. In addition, artificial intelligent systems need quicker decisions on streaming data. In this scenario, the proposed mechanism may be useful and can achieve efficient results. Besides, it can be considered applying advanced deep learning techniques like Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), for annotating IoT sensor data with optimum results.