Storing data from sensors networks

Sensor networks generate a massive amount of heterogeneous and multidimensional data with specific characteristics, for the storage, processing and visualisation of generated sensor data most often used specially designed databases for sensory data, which can integrate into cloud structures. This paper presents a diagram of the integration of data from sensor networks in the cloud, which solves essential issues about the cost of sensor networks and the possibilities for storing sensor data in specialised cloud databases. The peculiarities of the sensory data and the specific requirements for their storage and processing are analysed. Through which approaches and models in the storage are identified, as well as the possibilities for their application. TSBD time-series database and cloud databases suitable for storage of sensor data are systematised, and recommendations for the selection made.


Introduction
We live in an era of IoT, where billions of devices are connected and generate data with increasing speed. Obtaining information from these data gives a competitive advantage on people, businesses, organisations and governments. Networks made of intelligent IoT, also called sensor nodes in wireless sensor networks (WSN), have been a rapidly evolving field of research in recent years. These networks widely used to collect sensor data, which supports environmental monitoring, detection of anomalies in the health of patients of industrial applications, border security, etc. The amount of data generated by sensor networks is enormous, heterogeneous and multidimensional. Significant hardware resources of the sensor nodes (storage and computing) required for the storage and processing of this data. It known that the sensor nodes have limited computational and energy capacity [1], [2]. Integration between clouds and sensor networks is an excellent solution to the problem of the limited computing power of sensors, increasing the life of sensor networks, storing and processing the collected data. Storage of data is performed most frequently in various databases, which may be the integrin in the cloud structures. We often faced with the dilemma of what a database is suitable for the storage of sensor data. This paper presents a block diagram of the integration of sensor data, solving the problem of storing large volumes of sensor data and the cost of sensor networks analysis of the features of the sensory data and the specific requirements for their storage and processing. Systematised are sensory databases and cloud database suitable for storing sensor data. Recommendations have made for the selection of an appropriate database for the storage and processing of sensor data.

Related work
In work [3] the issues related to data security in different cloud models are analysed. The authors compare the protection provided by the different cloud models and conclude that in all three models, IOP Publishing doi:10.1088/1757-899X/1032/1/012012 2 data security depends on a secure and reliable network and a secure web browser. A distinction has been made between security in the clouds, depending on vulnerabilities and threats. The authors [4] present various energy-efficient techniques for data collection and storage concerning their energy efficiency. To reduce network traffic when there are strong Spatio-temporal correlations between sensor data is proposed, coding and classification of the sensor data category of IoT nodes. In [5] is presented an experimental study of the effectiveness of sensor systems for data storage, implemented based on file system Linux, depending on the number of sources data. The authors conclude that fewer hardware resources are needed when a log system used to build sensor storage systems. This system saves a small number of log files and uses a sensor-optimised mechanism for indexing (extracting time) and searching for data in log files. The authors [6] present an analytical study of trends in the field of storage and processing of sensory data. There is a tendency not to throw away anything when storing data, save all received data in the hope that existing or new technologies will use to allow the analysis of all collected data. The authors believe that not only secure data storage but also the ability to access them on demand effectively is essential. In [7] and [8], adaptive data downloading proposed to reduce the number of data samples, and the sensor data are redirected to the coordinator only when a significant change in the behaviour of the sensor nodes is detected.

А.Block diagram of the integration
As we explained above, connecting the sensor network to a cloud structure solves the problem of storing and processing large volumes of data generated by the sensor networks. The communication between the WSN and the cloud computing system, according to figure 1. The data collected and recorded by the individual sensors is transferred to the cloud, processed and can be viewed, manipulated and modified via client devices displaying an application, web page or application program API [9]. Figure 2, a block diagram illustrating a communication method. Data stored in a cloud computing system can be accessed by multiple and different client devices, including mobile devices such as laptops and cell phones. Cloud computing systems are more reliable than special server configurations, so measurements taken from sensor networks are more likely to be recorded and/or processed by such cloud systems. For example, data stored on a dedicated server it becomes inaccessible for some reason, the data generated and recorded by the sensor network during server failure is lost. In the case of medical sensors that record vital patient data, lost data can lead to incorrect treatment decisions. In case of

Start Sensor activation and data collection
Sending the data to the network coordinator Pre proces data in the coordinator Send data to the cloud for data processing View, manipulate and modify information via client devices Data processing in the cloud malfunctioning temperature sensors of computer equipment, the lost data can lead to damaged computer equipment, downtime and unavailability in other network services. Cloud computing systems for recording measurements from sensor networks can also be more economical than dedicated servers, as cloud computing systems are often paid for use and do not require the purchase of large amounts of computer equipment.

Data transmission via the coordinator
Different scenarios for data transmission are possible when integrating sensor networks with cloud structures. For example, the gateway can calculate the average measured values of the sensors, perform compression or other data encoding, transmit raw data or pre-processed data directly to the cloud structure. The data transmitted in the cloud structure encrypted. The coordinator may have an algorithm for detecting certain events and send a signal to the cloud computing system and/or send only the captured data. For example, the coordinator gateway can be configured to detect peaks in the measured data received from the sensor network or configured so that the sensor transmits only those data that have exceeded the configured threshold. The cloud computing system receives the data from the coordinator and records it. The data most often is stored in appropriate databases. The relocation of processing functions to the cloud computing system allows the development of sensors and coordinators with proper components, which reduces the cost of sensor networks in general. The importance is, especially considering that there can hundreds or even thousands of sensors in sensor networks. Processing in the cloud computing system may include data analysis, data validation, data clean up and/or data transformation.

.Data transmission directly
In this embodiment, the coordinator may miss or gateway to perform another role. In this case, the coordinate gateway or cloud computing system can activate the sensor via an RF module to read the measurement result.

Functions of clients devices
Client devices, such as a computer, tablet, or mobile phone, may be connected to the cloud computing system and have access. Access to the display, modify, delete, or manipulate data in the cloud computing system, including data received from the coordinate gateway sensor network, and other data stored in the cloud. Client devices can also direct measurement requests to each or only a specific IOP Publishing doi:10.1088/1757-899X/1032/1/012012 4 sensor on the network. Besides, they can configure the sensors. For example, they were configuring the coordinator for pretreatment, configure the sensor to perform measurements at regular intervals such as every hour or every day. Also, specific client devices can view data or process data.

Communication between sensors and the cloud
Block diagram of the method of communication between the sensor network and the cloud showing in figure 2. After activating the sensor, generated data transmitted to the coordinator. The coordinator can pre-process or transfer raw data to different databases in the cloud.

Characteristic of sensors data
Sensor data from a single sensor are simple time series. Often for an extended period, the data from the sensor may not change its values. There is a case where the volume of data grows rapidly; several reports per second for one or more sensors from one network. The analysis of modern solutions for storage and processing of sensor data aims to identify approaches and opportunities for their application.

Basic models for storage and processing of sensory data
At the moment, the following models distinguished:  Storage and processing of data outside the sensor network;  Distributed storage and processing on the network;  Combined solutions; The first model treats sensor data as a continuous stream that accumulates without loss in the sensor network, and then the data is transmitted and archived outside the sensor network. The collected data can be stored in different storages and can be made requests to access them using standard methods. This model implemented by a large number of already implemented networks designed to monitor all phenomena. Such networks are easy to implement but short-lived, especially when using high-speed sensors such as cameras, acoustic or vibration sensors, etc., as data transmission requirements often exceed the available energy resources.
The second model for working with sensor data considers the sensor network as a distributed database that supports requests for data provision and processing. In this model, a request directed to the sensor network, possibly to remote sensor nodes that store their data locally. This architecture is potentially more energy-efficient because the request processed on a smart sensor node, and the result of the request transmitted as processed data, rather than raw data. The energy costs of network transmission of large amounts of data exceed the costs of their local processing [6].

External storage and processing of sensory data
In centralised processing, the storage of data from sensor nodes performed on a server outside the sensor network. A popular way to organise a network is as a relational database. Although generalpurpose relational databases can store time-series data, they do not allow efficient work with an ordered set of time-series elements due to a lack of time optimisations for storing and retrieving data at time intervals.

TSBD time-series database
TSBD time-series databases specially designed to store sensor data. They belong to the group of nonrelational databases (Not only Structured Query Language / NoSQL) because they use search languages different but similar to SQL. NoSQL mostly used in real-time web and big data applications. A time-series database is a software system that optimised for handling time-series data, arrays of numbers indexed by time -DateTime or a DateTime range [10]. The timestamp usually has an accuracy of seconds and milliseconds, but with remote data collection, the accuracy can reach nanoseconds. The timestamp supports automatic adjustment of time zones. The frequency of data reporting can be periodic or when an event occurs. TSDB is optimised to measure change over time. Properties that make time series data very different from other data loads that are the generated data. And these are data lifecycle management, aggregation and scanning of a large range of many records TSBD provide users with a service for storing, deleting, updating and modifying time series. All sol is supporting some time-series calculations (e.g. multiplication, addition and other manipulations to convert one or more lines to a new line). There are tools to perform filtering on random samples, while the values of one row can be a filter for another.

Type of TSBD time-series database
According to the type of licenses, the databases classified as open sources, commercial, academic databases and industrial databases (figure3).
InfluxDB is a specially created, open-source time-series database. InfluxData is available in two versions InfluxDB Enterprise and InfluxDB Cloud. The InfluxDB data model has a linear protocol for sending data from time series, which takes the following form: measurement name, the tag-set field with a set timestamp. The measurement name is a string, the tag set is a collection of key/value pairs where all values are strings, and the fieldset is a collection of key/value pairs where the values can be int64, float64, bool, or string. Most other solutions for time series only support float64 values, which means that the user is unable to encode additional metadata along with timelines. Due to all these factors, InfluxDB is the best solution for working with time-series data lines [11]. Open Data Series Data Base (OpenTSDB) [12] is a time-series database built on HBase. It has excellent reading and writing performance, scales very well and is distributed free of charge. OpenTSDB can collect, store and service billions of data points without loss of accuracy, which makes it an ideal solution for the monitoring system. It consists of a Time Series Daemon (TSD) as well as a set of command-line utilities. Interaction with OpenTSDB is achieved primarily by starting one or more of the TSDs. Communication with TSD done with a simple protocol in telnet style, HTTP API or a simple built-in GUI. All communications are on the same port (TSD calculates the client protocol by looking at the first few bytes it receives). Data stored as timelines. Each time series is a sum of data points. Prometheus e TimeSeries DBMS and monitoring system with Open-source [12]. Grafite, Data logging and graphing tool for time series data Open-source [14]. RRDtool, is an industry-standard, with high-performance data logging and a graphical timeline data system. RRDtool easily integrated into scripts Perl, python, ruby, Lua или Tcl application [15]. TimescaleDB [16] offered as a cloud implementation Timescale Cloud, which can integrate into the cloud of a selected provider such as AWS, Azure or Google Cloud. GridDB optimised for IoT and big data. The basis of GridDB's principles based on offering a universal data cloud that is optimised for IoT, providing high scalability, performance and ensures high reliability. GridDB supports many time series functions, such as data compression for ever-increasing timeline data. APIs and other access methods Graphite protocol, HTTP REST, Telnet API [17]. The common characteristics of the different TSDBs are:  storing timelines series without loss of resolution;  high recording frequency (up to milliseconds);  scaling to millions of records per second;  horizontal scaling (increasing storage capacity by adding nodes);  providing a graphical interface for graphics;  read/write via HTTP. Databases timelines series are the fastest-growing segment in the industry databases. Their popularity, according to an independent website, DB-Engines (figure 4). The choice of a specific database may depend on the specifics of the implementation of the application that uses sensor data.

Online time series storage services
The cloud database is a type of database service. Cloud Database typically works as a standard database solution, which typically implemented by installing database software on a cloud infrastructure. Internet services for storage and processing of sensor data are usually a cloud service that is built, implemented and delivered through a cloud platform. This platform allows the user to send their data to the cloud directly from the sources, storing and managing the life cycle of this data. One of them is a cloud service delivery platform (PaaS) model that allows organisations, end-users, and their applications to store, manage, and retrieve data from the cloud. Access can be made directly through a web browser or applied by the provider API for integration of applications and services.
Current trends in the storage and processing of sensor data require requests to the service, use the provided API for different types of data processing by displaying them in graphical form. The service works in multi-user mode, allowing each user to work safely with their data set. Different services have different policies regarding service charges. The amount of free data stored is limited, or the user released from service charges provided they make their data publicly available, or the service is provided free of charge without a guarantee of the level of service, etc. The most famous services are Xively, TempoIQ, Nimbits Public Cloud [6], which specially designed for storing and analysing data from time series of sensors, smart meters, servers and more.

Conclusion
The paper presents and analyses a block diagram for the integration of sensor data in a cloud structure, which allows solving important issues about the cost of the sensor network and the possibilities for storage and analysis of sensor data in specialised databases. The paper analyses the modern solutions for storage and processing of sensor data, identifying the models for storage of sensor data and the approaches and possibilities for their application.
The known TSBD time-series databases, specially designed for storing data from sensors, are systematised. Databases designed for time series data use 1/2 to 1/3 more efficient storage and process data by providing high performance in recording and requests. One of the most preferred databases is InfluxDB, as it provides plug-in capabilities, multi-cloud support, data analysis solution, query builder, powerful client libraries and 24-hour support for open source users. InfluxDB processes data quickly on the order of milliseconds, microseconds or nanoseconds, making the database preferred for financial and scientific purposes, and the excellent compression mechanism and storage architecture making it suitable for surveillance systems.