An Embedded Cloud Design for Internet-of-Things

Internet-of-Things (IoT) consists of interconnected heterogeneous devices that ubiquitously interact with physical world. The devices are often resource constrained in terms of energy, computation, and communication resources. Distributing processing between these heterogeneous devices could yield to better performance and sharing, and extending resources of the devices could yield to more intelligent ubiquitous applications. Such a design can be called as “embedded cloud”, which is defined in this paper. An embedded cloud design is presented that consists of distributable Process Description Language (PDL), Distributed Middleware (DiMiWa), and an infrastructure. As a result, PDL can execute distributed processes and share resources as services over heterogeneous IoT devices with help of DiMiWa and the infrastructure. The design is evaluated with a prototype implementation, where PDL and DiMiWa are executed on a small 8-bit microcontroller-based IoT device. The implementation requires only 5122 B of program memory (4% of the available), consumes under 1 ms of CPU time per process in the worst case, and allows over 100 simultaneous services per device.


Introduction
Internet-of-Things (IoT) consists of heterogeneous networked embedded devices that communicate through wired or wireless links or the Internet to create new ubiquitous, mash up, context aware, and information-based applications [1][2][3]. Often these embedded devices measure or interact with physical world ubiquitously, and they consist of wired sensors, RFIDs, Wireless Sensor Networks (WSNs), and/or mobile devices. Typically, IoT devices have limited communication and processing capabilities due to the energy consuming communication, small physical factor, battery operation, and long lifetime expectations.
Distributing and networking tens or hundreds of IoT devices enable intelligent ubiquitous applications. Distributing the processing or the application logic over the networked IoT devices is an important feature [3], since in-network processing can improve energy efficiency and data delivery reliability due to the reduced communication and congestion, especially, on resource-constrained WSNs [4][5][6][7]. Current IoT abstractions that hide device heterogeneity from enduser application development do not take distributed or innetwork processing into account. The processing is done in the infrastructure, and the IoT devices are mainly used as heterogeneous data providers that are homogenized for the end-user applications [3,4,8]. The heterogeneity of the IoT devices makes distributing processing difficult, which may be the reason for the lack of such proposals. In this paper, heterogeneity means wired and wireless measuring and actuating technologies that use different communication methods and data formats without any direct device-todevice interoperation compatibility.
Modern mobile phones already use cloud services to extend their resources, such as iCloud on Apple devices and SkyDrive on Microsoft devices. On current IoT abstractions, IoT technologies work only as data providers and the application is implemented on the infrastructure [4]. However, the heterogeneous IoT devices in one location could potentially implement the application on their own, if they could expand and share their resources. As found out by Parwekar [3] and the authors of this paper [4], there is a lack of design proposals for IoT devices that would allow distributing processing and expanding resources between different IoT technologies.
We propose embedded cloud as a solution that would share, distribute, and expand resources of heterogeneous IoT technologies. As a contribution, we define the requirements 2 International Journal of Distributed Sensor Networks for an embedded cloud and propose an embedded cloud design, which extends resources of measuring and actuating IoT devices. The novelties of the presented design are as follow.
(i) Heterogeneous IoT technologies can access external resources (other IoT technologies services and cloud processing) from the embedded cloud and virtually extend their own resources to the cloud services through the presented Distributed Middleware (DiMiWa).
(ii) The application logic is developed with a device independent Process Description Language (PDL), which is simple enough to be implemented and executed even on small 8-bit resource-constrained WSN devices, and versatile enough to allow implementing complex processes with a small memory footprint.
(iii) The application logic can be executed on the IoT devices of the suitable parts; there is no need to route everything through central arbitrating server. Further, technology specific processing can be harnessed, which allows to achieve the energy conservation benefits of in-network processing [4][5][6][7].
(iv) The design does not discriminate any technology; a solution to connect any measuring or actuating device as a part of the proposed embedded cloud design is presented.
PDL allows distributable process execution and a simple platform independent process creation with a small memory footprint. DiMiWa abstracts technology heterogeneity into services and provides sharing and expanding of the services (both local and remote services without visible separation to the application) while ensuring efficient use of the IoT technologies. The infrastructure provides PDL and DiMiWa execution environment for those technologies that cannot match the running requirements. Also, the infrastructure works as a communication arbiter and a data storage and provides those processing tasks that are too resource consuming for the connected IoT devices. Together these three components form an embedded cloud. The design is evaluated by studying the feasibility with a prototype implementation, three use cases, a scalability evaluation, and a comparison of the features against the related research.
The paper is constructed as follows. Section 2 gives our definition and requirements for an embedded cloud. Section 3 covers the related work of IoT clouds and the components of the presented embedded cloud design. The embedded cloud design is given in Section 4, and it is evaluated with a prototype implementation in Section 5. Section 6 discusses open questions in the design. Finally, Section 7 concludes the paper and gives future work.

The Definition of the Embedded Cloud
Defining cloud computing has been a cumbersome task and no widely accepted definition can be found [9,10]. Infrastructure, platform, and software as a service (IaaS, PaaS, and SaaS) are the main cloud computing approaches, which are results of service oriented architecture (SOA). At the time of writing, a Google search of the "embedded cloud" gives approx. 25000 hits, and the first hits redefine "cloud" of infrastructure connected IoT devices as an embedded cloud [11][12][13]. A Google Scholar search gives approx. 248 hits for the scientific publications discussing "embedded cloud". However, most of the findings are not computer system studies. IEEE Xplore search engine does not give hits for the "embedded cloud" term; a search with separated terms over AND operation gives 329 hits from the relevant research fields.
Current "embedded", IoT, or sensor cloud proposals utilize IoT devices only as heterogeneous data providers and use existing cloud computing approaches for homogenizing and further refining the data for the end-user applications [3,8,14]. In such papers, the embedded cloud is seen as an assorted collection of homogenized data producers. However, we consider embedded cloud to have more requirements from the IoT devices. Therefore, we propose a definition for the embedded cloud.
Cloud computing provides virtualized computing services without logical or location relation to physical hardware [10]. This is impossible in the embedded world due to tight relation with the physical world: sensors measure specific quantities at the specific locations and at the specific time [2]. Thus, the embedded cloud has two paradigms: (1) the heterogeneous IoT devices provide homogenized services to the end user through the embedded cloud, such as measurement data and actuator controls. (2) The embedded cloud provides and shares services between the IoT devices, such as processing power, storage space, and those services that do not exist on a particular IoT device. Then, the IoT device itself can virtually extend its resource to the embedded cloud. If these two paradigms are accomplished, creating ubiquitous applications could be revolutionized, since the resource constraints of the smallest IoT devices would no longer restrict the development, not even at the local domain of the device itself. Figure 1 illustrates this definition of the embedded cloud.
The general requirements for an embedded cloud design are as follows.
(1) It should homogenize data accessing and processing of heterogeneous IoT technologies for the end user.
(2) It should extend resources of the IoT technologies.
(3) It should distribute processing that the IoT technologies can share resources and make the most of the available resources.
All three requirements are complicated due to the heterogeneous IoT devices with varying computing, communication, and energy resources. First, homogenization is difficult, since communication models, sampling rates and accuracies, data access notations, and so forth are different between technologies. Second, extending the resources is difficult, for example, if the device is only capable of sending and receiving few bytes occasionally. Third, distributing the processing and sharing resources require approaches where computation, International Journal of Distributed Sensor Networks 3 Embedded cloud for IoT: Discover, share, store, and process data End user interface to the embedded cloud Embedded device interface to the embedded cloud Any internet connected embedded device sees a supply of virtual resources and services: End user sees one resource rich homogenized network of services: without knowledge of the delivering hardware • Access data from any embedded device • Create and inject processing tasks to the cloud • Access data that is not produced by the device • Store and process data without resource constraints communication, and energy resource differences are solved. The resulting design should be lightweight to work on the most resource-constrained IoT devices; yet, the design should be agile and versatile to cover the vast application space of IoT. Finally, through these requirements the embedded cloud should allow easier application development for the IoT technologies without the limitations of the resource constraints and without tailoring due to the technology heterogeneities.

Related Work
The critical problem in current heterogeneity solving proposals is the lack of harnessing the processing capabilities of the connected IoT technologies. They fold into two categories: first, an adaptation layer is used as a common gateway for heterogeneous technologies. In these solutions, such as OGC SWE [15] and GSN [16], the data produced and/or consumed by the sensor networks is adapted for the enduser application that runs on a server infrastructure [4,17]. These solutions do not provide real device-to-device or technology-to-technology interoperation and cannot solve the basic use case: how to allow one device to utilize a measurement of a different device. They only remove the heterogeneity of the connected technologies for the enduser application. Further, the processing capabilities of the technologies are not harnessed. Second, a knowledge sharing points allow heterogeneous devices to interchange data, as in Smart-M3 [18]. These solutions do not share processing, extend resource, or harness the processing capabilities and they require tailored application development on each device.
Current sensor cloud or IoT cloud proposals use IoT devices as data producing sensors and data consuming actuators similar to heterogeneity adaptation proposals. IoT clouds use cloud computing for storing data, refining data, and providing refined services to end users; such proposals can be found, for example, in [14,[19][20][21][22][23]. These proposals use the cloud computing due to two factors. First, the cloud computing can dynamically scale to the ever increasing device, data, and end-user amounts, since IoT can potentially have many thousands of devices producing vast amount of data. Second, the cloud computing provides a cost-effective infrastructure to process the vast amount of data, while scaling to the random requests from the end users.
RoboEarth project proposes similar approaches for autonomous mobile robots. RoboEarth stores record of the physical world and action recipes to a database to be shared between Internet connected robots [24]. Our design is for wider interaction between heterogeneous IoT devices. Also, our design is able to cover RoboEarth functionality, achieve it on smaller devices, and extend it with interaction of other nonrobotic embedded devices. Authors of this paper were not able to find any directly comparable IoT design from the scientific world that would expand and share resource and distribute processing on the measuring and actuating IoT devices. However, the components of our design have comparable proposals.
Knowledge sharing proposals can be compared to DiMiWa. Smart-M3 [18] allows device-to-device data exchange through shared knowledge points, but it does not expand resources virtually or allow distributed processing. Gómez-Goiri and López-de-Ipiña propose a distributed middleware that shares data between heterogeneous IoT devices using RDF [25] and triples [26]. In addition to sharing, data can be queried from the tripe space. Gómez-Goiri and López-de-Ipiña proposal does not distribute processing, and the Java ME implementation is complex and heavy for resource-constrained devices. Both middleware 4  proposals require tailored application implementation on the connecting IoT device. Distributed WSN middleware can be compared to DiMiWa as well. SQL query-based middleware, such as TinyDB [27], provides a SQL-like interface to query and process measurements from a WSN; however, TinyDB cannot function between different technologies as DiMiWa does. TinyLIME [28] and Agilla [29] allow distributing data between mobile devices in a WSN through a tuple space. Again, these solutions cannot be distributed between different technologies. Any of the presented related solutions do not expand resources of the resource-constrained WSNs.
Virtual Machines (VMs) for WSNs and process description languages are the related work for the proposed PDL. Mate [30] is a well-known virtual machine for WSNs that allows distributing WSN applications to the resourceconstrained WSN nodes without actual reprogramming. Mate itself does not support distributed processing and it functions close to the hardware. The proposed PDL allows process distribution and it utilizes abstracted services of DiMiWa. Process description languages are often XML, such as Web Service Definition Language (WSDL) [31], and intended for large scale computing. Thus, unlike PDL, they are not suitable for small resource-constrained IoT devices.

The Embedded Cloud Design
Our embedded cloud design consists of three layers as presented in Figure 2: the infrastructure, DiMiWa, and PDL. In this design, processes and services provide the homogenizing abstraction. A process is described as actions that use the services. Typical processing tasks (such as average calculation) are services as well. A process can be distributed onto the IoT devices where applicable. A domain is used to provide locality into the services. A device can only see those remote services that are in the same domain. Together these design components fill the requirements of the embedded cloud as shown in following sections.    The device requests for new data for an existing service and data is delivered once Set service data The device sets new data for a service Store service data The device instructs cloud to store new data of an existing service. Data is stored once Subscribe service Subscribed service data is delivered continuously when new data is ready Unsubscribe service The device unsubscribes for data Set processing service The device sets a service for a processing service and the payload typically constructed from service IDs, small amount of data, or a domain. The data formatting (units) is part of the implementation. The messages allow the basic functionalities of service discovery, data delivery, and process injection. The service discovery happens during the registration. The IoT device informs about itself to the embedded cloud, which responses with the available services. The data delivery is performed with the service data pushing and pulling, getting and setting, and subscribing and unsubscribing. The distributed processing is done with pushing and pulling the processes by the infrastructure and creating processing service chains by the IoT device.
The communication parameters, such as node and network IDs, should be in the headers of the packet that encapsulates the message. An adapter may be required to translate the IDs, which is a part of the implementation of DiMiWa.

Database.
Each registered device is stored in to Node DB that contains a node ID, a domain, a key to Network DB, and relations to Service DB and Process DB. Network DB contains connection information to each network: an IPaddress, a port, and a packet wrapping format, if required. Service DB keeps track of services of each node and in each domain, and Process DB keeps track of PDL processes. The Blob DB stores the data delivered to the embedded cloud and the data generated by the processing tasks in the infrastructure. The data has a time stamp of its arrival.

Processing.
The infrastructure has three main processing tasks. First, it must handle the communication between devices running DiMiWa. Second, it must execute DiMiWa and PDL for those technologies that cannot execute them on their own. Third, it must run processing services that are too heavy to run on typical IoT devices, for example, FFT, pattern recognition, and artificial intelligence.
The communication handler approves node registrations, delivers domains and services to the nodes, stores and requests data to the blob store, handles the data subscriptions, and delivers data to the nodes. If one node instructs store or get to a remote service and the cloud does not have recent enough data from that service, the infrastructure can request a new data from the service owner.
The in-cloud processing and resource sharing create vast possibilities of distributed intelligence similar to RoboEarth [24]. For example, one could create a small battery-operated mobile robot with a camera and run a face detection on that device. If the mobile robot would be connected to the embedded cloud, it could store every face in to the cloud. Further, the face detection processing could be moved to the cloud as well, if this would, for example, save energy. Even further, if the robot is replicated, all the robots could share the same face database, and when one robot adds a new face to the cloud, all the robots would be instantly able to detect that same face.

End-User Access.
We selected not to concentrate on end-user application interface in this paper, since it is a large research problem on its own. The embedded cloud essentially provides an access to its databases for the end user. In addition, the end user can add PDL processes and processing tasks to the infrastructure. However, the implementation is not straightforward or well abstracted with a such basic interface. We reckon that additional application cloud infrastructure would be beneficial to improve data refinement and allow higher level processing. This infrastructure could potentially generate PDL processes automatically from a high level description of an application, as we presented in [4]. This application cloud on top of the embedded cloud is left as future work.

Distributed
Middleware. DiMiWa works as a middleware in the embedded cloud design that allows different technologies to connect to the embedded cloud. It provides a homogenized service interface for the PDL processes. The implementation can be executed on device or on the cloud, which ensures that no technology is discriminated. The implementation execution possibilities of DiMiWa and PDL are presented in Figure 3. If a device/technology cannot satisfy the presented requirements of DiMiWa, the device/technology can be connected to the cloud through an intermediate hardware, for example, with a Raspberry Pi as the actuator is connected in Figure 3. If the device/technology has an Internet connection capability, a virtual DiMiWa implementation can be run directly in the infrastructure, as the web camera is connected in Figure 3. Thus, DiMiWa is distributed middleware over several technologies.
To ensure wide portability of DiMiWa, the requirements for the executing IoT device have been minimized: the IoT device must be able to send and receive data packets from the Internet (e.g., through a tailored gateway) with a payload of at least 9 B (1 B packet type and at least 8 B of payload), it must provide a time stamping method, and it must be able to execute a DiMiWa implementation. As the only requirement is 9 B of payload and two-way communication, DiMiWa can be used on top of several IoT communication technologies, such as often referred ZigBee [32], 6LoPWAN [33], and CoAP [34].
DiMiWa consists of two interfaces: an application programming interface (API) for PDL and the message interface that is used to communicate with the infrastructure. The design is kept minimal to ensure wide portability. In addition to these two interfaces, a service cache is used to keep track of local and remote services. DiMiWa implementation can have technology specific features; for example, a DiMiWa implementation for a WSN could deliver data straight from node to node and bypass the infrastructure to reduce packet amount, or a WSN could select process runner according to the routing hierarchy and available in-network services.

4.2.
1. An API to PDL. A PDL process can store a value of a service, trigger to a service, get a value of a service, set a value Sets value of the service to be processed by another service of a service, and instruct a service or data as an input to a processing service. The API is presented in Table 3. PDL uses these interface functions in its execution and the PDL design follows this same service paradigm. However, the IoT device can implement other applications on top of the DiMiWa API as well.

Service.
A service contains an identifier, a domain, and a class. The service identifier and the domain together form an address space that is used to distinguish and access the services. The physical relation is abstracted; thus, the service user cannot know which physical device is actually implementing the service. However, the identifier and the domain restrict the physical location of the service implementer.
The service class contains a flag for local and remote services. The class itself provides information of the usable actions on the DiMiWa process. The classes are BLOB, SAM-PLE, and EVENT. A BLOB class service produces amount of data that cannot be presented in one variable, for example, images, graphs, or sounds. The intention is that resourceconstrained devices do not try to access a BLOB class services (it is not forbidden though). SAMPLE class services produce a measurement or take an adjustment parameter of the size of one variable on request or on intervals. EVENT class services produce measurements of the size of one variable after some events.

Service Cache.
Node mobility is evident in the embedded cloud, which causes services to appear and disappear dynamically: storing and discovering methods are needed. A service cache is used to store local and remote services and their data. DiMiWa registers local services in the infrastructure upon a registration. As a response, the infrastructure delivers available remote services according to the domain. All used services are stored in the service cache, and they all have a Time-To-Live (TTL) value that ensures cleaning of disappearing remote services eventually. The local services have an infinite TTL, and they stay permanently in the service cache.
The service cache holds the newest value for each SAM-PLE and EVENT services. BLOB services are stored only into the infrastructure. The get operation returns the value for the SAMPLE and EVENT class services. The trigger returns a true boolean value, if there is a value entry with recent enough time stamp. The time threshold can be selected in the implementation taking technology specific packet delays into account. In a case of store operation for a local service, the value from the cache is sent to the infrastructure.

Process Description
Language. PDL was designed to be platform and programming language independent, to have a small memory footprint when implemented, and to not require real multitasking to ensure the suitability for the resource-constrained IoT devices. PDL itself provides a cooperative multitasking for its processes.
A PDL process is a series of known size actions that interact with DiMiWa services. Each action in the series is executed step by step in a state machine. PDL resembles an instruction set of a virtual machine, but the operands are local or remote services instead of typical register and memory accesses of a CPU. The known size of the actions and the following operands make parsing and implementing the executing state machine easy. Each process has an accumulator register ACCU for storing and manipulating values returned by the services. Table 4 presents all the actions of PDL. The actions allow accessing the DiMiWa services, creating timed actions, manipulating the execution flow, manipulating the accumulator, and instructing the DiMiWa processing services. The small amount of the actions ensures that the implementation is lightweight.
PDL requires a call of a timing function once a second that is the timing granularity. If the target technology cannot provide this granularity, the PDL must be implemented in the infrastructure.
The execution flow can be manipulated with two different actions. First, a timeout can be created with TIMEWINDOW. If the following action does not proceed or produce a TRUE result within the time window, the process is restarted. The accumulator can be manipulated with setting, adding, subtraction, multiply and division operations. The manipulation can be done between an immediate value or a DiMiWa service. The arithmetics are saturating; thus, accumulator overflow is not possible.
The end user can add new processes to the embedded cloud. The embedded cloud then resolves the best execution place for that process. The process could be even split into smaller PDL action series, if required. However, algorithm and design for splitting the PDLs are left as future work.

Evaluation with a Prototype
The evaluation studies the feasibility of the presented embedded cloud design. First, we evaluate the implementation feasibility of PDL and DiMiWa on resource-constrained IoT devices. Second, we discuss scalability issues. Third, we evaluate usability of the PDL processes using for example use cases. Finally, we present a comparison of features to existing related work.

Implementation Feasibility.
We have implemented a portable implementation of DiMiWa with the C programming language. The portability requires two interfaces in addition to the API and messages of the embedded cloud design: a platform interface and a portable interface. The platform interface must be called by the IoT device running the DiMiWa implementation. The portable interface must be implemented by the IoT device.
The platform interface has three functions: handling the received packets, keeping up the connection to the embedded cloud, and resolving the domain, if the device physically moves.
The portable interface implements the following functions. An initialization function does all the device specific initializations, for example, adding the local services to the DiMiWa service cache. A wrapped packet allocation is provided, where the packet holds room for the IoT technology specific communication and the DiMiWa packet and the port fills the device specific headers and trailers. A function for sending a packet towards the Internet is provided. A function to get the time for DiMiWa cache is implemented. It should be noted that the time format can be anything, since the time is not delivered to the embedded cloud. Getting and setting functions are implemented for the values of the local services implemented by the device. Finally, memory allocation and freeing functions are implemented.
PDL provides four interface functions: pdle clock tick() must be called once a second to create the internal timer, pdle add process() adds a new process to the execution, pdle remove process() removes a running process from the execution, and pdle run() executes the running processes.
In the DiMiWa implementation, one service is 32 bits, one data entry is 32 bits integer, the class is 8 bits, and the domain is 8 bits. One DiMiWa cache entry requires memory for 10 B of data (a service, a class, a TTL, a value, and a time stamp) and two pointers.
Listing 1 presents how PDL and DiMiWa are executed using threads in our HybridKernel [35]. These threads are built on protothreads proposed by Dunkels et al. [36], which are used in Contiki [37]. Thus, the same execution model should work for Contiki as well. Both systems software are designed for small 8-bit microcontrollers that are often used on resource-constrained WSN devices.
Our prototype implementation was tested using two PIC18F8722 8-bit microcontrollers [38] equipped TUTWSN WSN devices [39], two Raspberry Pi devices [40], and a laptop running the infrastructure as shown in Figure 4. The WSN devices implemented a Passive Infra-Red (PIR) motion detector and temperature measurement services, one Raspberry Pi implemented a camera service with USB WEB camera, and the other one implemented a sound producing on/off actuator service. The packets were delivered using UDP between the Internet connected devices.
On the TUTWSN device, the PDL implementation takes 1900 B, and DiMiWa implementation takes 3222 B in program memory. This totals to 5122 B, which is 4% of the available 128 KB program memory. These values were gathered with Microchip MCC18 compiler using optimizations. The values do not contain the library functions (such as a linked list), drivers, protocol stack, or operating system. From the implementation results it can be concluded that PDL and DiMiWa can be implemented for a small 8-bit IoT device. The Raspberry Pi implementation executable is a 23.5 KB ELF file.
Executing a process on PDL requires varying time on each step and iteration. Table 5 presents the worst case execution times averaged for the PIC18F8722 implementation running at 4 MHz (1 instruction per s). The execution times were gathered by probing MCU pins with an oscilloscope, running a test PDL process, and toggling pins according to actions. Jumps and operations manipulating ACCU require varying time due to the 8-bit MCU and 32-bit operations that are implemented with software by the compiler. The figures are considered reasonable, and the presented implementation can run over 100 PDL processes on the PIC18F8722 as shown in the scalability section. The execution times do not contain protocol stack operations (e.g., sending a packet).

Scalability.
Scalability should be considered for small resource-constrained WSNs and IoT technologies, since their memory, energy, and communication resources are often limited. Thus, resource-constrained WSNs are the possible bottlenecks of the presented design. The main scalability issue is the amount of services and processes that one device can potentially handle. Eventually, data memory and execution time will run short.
With the presented TUTWSN and PIC18F8722 implementation, DiMiWa and PDL have around 1968 B of data memory in their use. As described in Section 5.1, one service entry in DiMiWa cache requires 10 B of data memory and two pointers. Since pointers are 2 B on PIC18F8722, this totals to 14 B per cache entry and the upper limit for subscribed services of one device is 1968 B/14 B = 140. The PDL processes can be stored to the program memory, but each PDL process requires 16 B of data memory and the upper limit for processes is 123. Since both processes and services compete from the same data memory resource, a compromise is needed, and both the process and the service amounts need to be carefully controlled. The execution time depends on the PDL process amount and structure. The presented implementation runs PDL every 100 ms (Listing 1), and on the worst case one action takes 800 s. As a result, the deadline of 100 ms will be breached with 125 PDL processes running a lengthy action at the same step. Considering the data memory constraints, it is seen that the execution time will not restrict scalability on PIC18F8722-based platforms. However, the execution time is energy consuming, and therefore the PDL process amount should be carefully controlled.
The presented figures suggest that the implementation is well usable with PIC18F8722 and a TUTWSN network of 100 devices could execute even 12300 different PDL processes, which should be considered enough. Resource-constrained WSNs typically have limited amount of duties, since the connections (sensors and actuators) to the physical world are limited.
In addition to service and PDL process amounts, the message exchange affects scalability. If the device subscribes a lot of remote services, the infrastructure might push too much data to the device. This could cause two drawbacks: first, the network gets congested due to the amount of delivered messages. Second, the increase in data traffic might increase energy consumption. These figures depend on the underlying transport technology and routing topology, which make estimating them very difficult as the embedded cloud is designed to connect any technology. For example, on TUTWSN, the increase in data traffic is not vital, if the messages are delivered through the so-called reserved slots [39]. Communication must be done on these slots even if there is no application data to be send. However, it must be emphasized that the presented design allows the implementation of DiMiWa and/or the execution of the PDL process resides on the infrastructure as shown in Figure 3 with the web camera. This would remove the overheads from the technology. In the current design, this issue can be avoided with careful use of the PDL processes.
The scalability of the infrastructure is assumed to be infinite in this paper. In practice, there is an upper limit for amount of connected devices, exchanged data, and services, but we assume that exhausting the infrastructure is a nonrelevant problem, if the design is deployed in large scale over distributed data centers or cloud platforms; for example, Facebook is currently able to serve over one billion users [41].

Example Use Cases.
With the following three use cases, we show the versatility, usability, and feasibility of the PDL processes. Listing 2 presents a basic temperature measurement process with 5-minute intervals. This is a typical task required from a WSN node. Listing 3 presents a process that detects humans from a picture after three distinctive motion detectors have been triggered within a time frame of one minute. If a human is detected, a gate is opened for 5 minutes. This process could be used to separate humans from caged wildlife. Implementing this process on a resourceconstrained WSN would be difficult due to the large data of the picture and processing power required by the image processing. Listing 4 presents a simple P controlled temperature controlling that utilizes averaged temperature, which would improve the result compared to a single point of measure. This is a typical WSN middleware and in-network processing use case for example, in a building automation.
Listing 2 process is in size 9 B. Respectively, Listing 3 is 66 B in size and Listing 4 is 62 B in size. Direct comparison to a tailored application or a Maté implementation is difficult due to the service approach of DiMiWa, but it is easy to estimate that programming the same behavior with CPU like byte code would yield a larger footprint. Relatively small size of the processes allows a process injection to a WSN over the WSN protocol stack, without a need for a program image or firmware distribution support [42]. The use cases were tested with the prototype implementation by simulating the physically missing hardware.
On Listing 3, the triggering could be executed on the node that is the first node to route all three triggering messages, Listing 4: A 62 bytes PDL process with P controlled temperature controlling. for example, the first routing node in Figure 3. When the three motion detection sensors alarm (PIR, motion sensor, and sound sensor) and send their messages to the routing node, the node could drop the packets and only send the store command for the camera. Typically, this would reduce communication, energy consumption, and/or congestion [4][5][6][7]. The average calculation on Listing 4 process could be executed with in-network processing of a WSN with the same benefits.

Comparison.
Comparing the memory and execution overheads of the implementation is difficult, since it would require implementing all the related work on the same platform and there are no existing comparable overhead figures. Also, the novelty of the presented design is in its features that make it versatile. Therefore, we concentrate to compare features of related work to present our design versatility in Table 6. Although RoboEarth is not intended for IoT use, it has been included since it shares similar approach. TinyDB supports application implementation, dissemination, and distributed processing within one technology; thus, it is only considered to support those partly.

Open Questions and Discussion
The presented design is a starting point for the embedded cloud designs. There are open questions; for example, security, privacy, ontologies, or data semantics are main requirements for any IoT technology. Security and privacy are not discussed, since the presented design and implementation rely on the built-in security methods of the used technologies. Handling the encryption keys and so forth is an open problem that should be solved. Ontology and semantics rule the format and units of the data. One ontology can be selected in the implementation of the presented design. The ontology should provide identifiers for each device, and it should define units, ranges, accuracy, and so on for each measurement in the embedded cloud. The units and accuracies were selected according to the available hardware on the prototype implementation.
Although connected devices register to the embedded cloud through DiMiWa and do the service discovery with help of the infrastructure, there are management issues that were not discussed in this paper. For example, how the domains are set. In the prototype implementation we hardly coded the devices and their domains, but an automated method is needed to make this design more complete. This would be easy to solve, if all the devices could produce a WGS84 or similar coordinates for their location. On the other hand, often the domain is a description of the place, such as kitchen, a social security number, or a room number. Creating a completely automated system is a very difficult research problem.
The presented design could be improved for even better scalability with two extra components that keep track on subscribed services and running PDL processes on each device. These components could be called DiMiWa service and PDL process brokers. The DiMiWa service broker could monitor the amount of services subscribed on the device, and when the threshold is achieved, it could move some of the processing to the infrastructure to reduce amount of subscribed remote services. The PDL process broker could ensure that the device does not exhaust under its load. The PDL process broker could move PDL processes to the infrastructure, if the device runs out of memory, execution time, or energy. These two brokers should cooperate to ensure the best utilization of the technologies. Therefore, they both require understanding of the executing technology, but currently there are no such technology models available that could be used as an input data for these brokers. The brokers and the technology model are the important part of our future work.

Conclusions and Future Work
In this paper, an embedded cloud was defined as a design that homogenizes distributed processing, resource extending, resource sharing, and data accessing of heterogeneous IoT devices. A novel embedded cloud design was presented that consists of a distributable process description language, a distributable middleware, and an infrastructure. Together these components form an embedded cloud that expands resources, distributes processing, and hides heterogeneity even for the most resource-constrained IoT technologies. The prototype implementation was functional even on an 8-bit microcontroller-based WSN device.
As future work, we will study modeling IoT devices to describe their performance and capabilities and to try to create an algorithm that utilizes these performance descriptions to distribute the PDL processes in a more intelligent way through the DiMiWa service and PDL process brokers.