XML-Based Information Fusion Architecture based on Cloud Computing Ecosystem

Considering cloud computing from an organizational and end user computing point of view, it is a new paradigm for deploying, managing and offering services through a shared infrastructure. Current development of cloud computing applications, however, are the lack of a uniformly approach to cope with the heterogeneous information fusion. This leads cloud computing to inefficient development and a low potential reuse. This study addresses these issues to propose a novel Web 2.0 Mashups as a Service, called WMaaS, which is a fundamental cloud service model. The WMaaS is developed based on a XML-based Mashups Architecture (XMA) that is composed of Web 2.0 Mashups technologies, including Web Data, Web API, Web Interaction, and Web Presentation to associate with existing service models. To demonstrate the feasibility of this approach, this study implemented a Ubiquitous Location-based Service System (ULSS) that is a cloud computing application developed based on WMaaS to provide continuous and location-based schedule information for organization monitoring and end user needs.


Introduction
Authors are encouraged to use the template for Microsoft Word, to prepare the final version of their manuscripts and facilitate typesetting. Authors may elect to submit two versions of their manuscript, one for the printed version of the journal, and the other for the on-line version of the journal. Illustrations in color are allowed only in the on-line version of the journal. Cloud computing has become one of the most promising information solutions and business trends in recent years. The first to introduce the term cloud computing was Google's CEO Eric Schmidt. The term refers to the important and long-term trend in computing over the Internet. Many institutions and companies provide definitions and solutions for cloud computing [Dillon, Wu and Chang (2010)]. However, there is still no widely accepted definition of cloud computing. The NIST [NIST (2019)] defines three types of cloud computing service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). SaaS provides services to cloud clients, while IaaS and PaaS provide services to cloud application developers. satisfy customers' needs. Third, a cloud computing application, Ubiquitous Locationbased Service System (ULSS), is implemented based on WMaaS to provide continuous and location-based schedule information for organization monitoring and end user needs. The ULSS carries out to integrate four emerging research areas: Web 2.0, Open Data, Cloud Computing, and Ubiquitous Context-awareness [Hsu (2013a)]. The remainder of paper is organized as follows. The next section presents some related works. Section 3 presents an XML-based Mashups Architecture (XMA) based on XML technologies. Section 4 describes Web 2.0 Mashups as a Service (WMaaS). In Section 5, the study implemented a Ubiquitous Location-based Service System (ULSS) to demonstrate the feasibility of WMaaS. Finally, summary and concluding remarks are included.

Related work
In recent years, more and more cloud service providers have published APIs that enable Web application and APP developers to easily integrate open data and web services, instead of developing them by themselves. Mashups can be considered to have an important medium in the evolution of Web 2.0 era. In the past years, there are many studies to discuss the mashups application in various domains [Boulakbech, Messai, Sam et al. (2016); Ghiani, Paternò, Spano et al. (2016); Zhang, Fu, Sun et al. (2016); Zhong, Fan, Tan et al. (2018); Wang, Wu and Hsu (2019)]. In Ghiani et al. [Ghiani, Paternò, Spano et al. (2016)], system developers can create new mashups by existing interaction components via a graphical environment. A mobile-application prototype [Boulakbech, Messai, Sam et al. (2016)] is implemented adopted mashups Web services to provide customized touristic plans. In Lee [Lee (2015)], authors focus on web services mashup to develop novel algorithms for the automatic discovery and composition of Web APIs. It should be noted that above-mentioned studies did not provide a complete, flexible, and versatile mashup architecture to facilitate software development. In Chavarriaga et al. [Chavarriaga, Jurado and Rubio (2017)], authors propose an XML-based domain specific language approach for client-side Web applications. This study proposes the XML-based Mashups Architecture (XMA) based on XML technologies is revealed to cope with the heterogeneous, flexible, and versatile issues of mashup applications. Cloud computing is an emerging paradigm where computing resources are offered over the Internet. There are two problems of using cloud computing resources. Cloud resource providers cannot easily publish their services for cloud users, and cloud users cannot easily find useful cloud resources. These problems result from the heterogeneous cloud resources. The Mobile Ubiquitous Brokerage as a Service (MUBaaS) [Yan, Sun, Liu et al. (2016)] permits n-devices of a user to access diverse cloud services. The Workflow platform as a service (WaaS) [Fan, Hussain and Hussain (2015)] supports users to define, and integrate workflow based applications to facilitate the rapid development of cloud computing. In Herbold et al. [Herbold and Hoffmann (2017)], authors propose the modelbased testing as a service to deal with the complexity of the Web service based on using cloud infrastructures. Additionally, many studies of integrating cloud services architecture with various application domains have been reported recently by researchers The service models of cloud computing are usually classified as Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) [Chakraborty, Ramireddy, Raghu et al. (2010)]. IaaS provides the hardware and administrative services that are required to store cloud applications and a physical platform for running cloud applications. These infrastructure resources, including computer server, storage system, network equipment, and data center, offer basic storage and computing capabilities as standardized services over the Internet. The cloud developers would typically deploy their applications on the infrastructure. Typical examples are Amazon EC2 (Elastic Cloud Computing) Service and S3 (Simple Storage Service) where compute and storage infrastructures are available to public access as a utility. PaaS provides an integrated environment or middleware using which cloud developers can implement and deploy cloud applications without the cost or need to purchase hardware or software. In Kryukov et al. [Kryukov, Demichev and Polyakov (2016)], authors suggest that the traditional global grid systems will be transferred to creating convenient and efficient means of access to the distributed cloud resources. Usually, a set of development tools and services are run on servers in the cloud. Practical examples include Google App Engine, Microsoft Windows Azure platform, Salesforce Force.com, and Aneka. Google App Engine provides a python or java runtime environment and APIs using which developers can build Web applications on the same scalable systems that power Google applications. SaaS provides on-demand access to web-based applications that are maintained centrally by a provider. Hadoop is a cloud computing platform that is an open source project under the Apache Software Foundation. Map Reduce and HDFS (Hadoop Distributed File System) are the two cores of Hadoop; the former enables distributed computing, and the latter achieves a distributed file system with the advantages of high fault tolerance. The YARN is a kind of cluster manager that is proposed in Hadoop 2 version. Apache Spark is also an opensource platform that adopts in-memory technique, and finally stores the computing results in the hard disk, thus the I/O time can be reduced. The Apache Mesos is a new distributed system core of Hadoop. Mesos regards the multi-node resources as a single highperformance computer, including the CPU, memory, hard disk and other computing resources [Saha, Beltre and Govindaraju (2018)]

XML-based mashups based on cloud ecosystem
Mashup is now a major Web 2.0 culture. To develop WMaaS, this study first needs to concrete the concept of the Web 2.0 Mashups. An XML-based Mashups Architecture (XMA) is proposed to include Web Data layer, Web API layer, Web Interaction layer, and Web Presentation layer. This research also describes that XML should form the backbone of XMA in support of technologies relevant to the function of each layer.

XML-based mashups architecture
XMA is developed based on XML technologies, including XML, XML Schema, XSLT, XPath, RDF, OWL, and Namespace. This architecture is depicted in Fig. 1, which can be represented in four layers: Web Data layer, Web API layer, Web Interaction layer, and Web Presentation layer. The XML-based technologies are adopted across the four layers.
XMA provides a flexible infrastructure that developer can dynamically add, replace, and remove components in each layer. Each layer contains multiple technologies, all of them providing a service suitable to the function of that layer. WMaaS can cope with heterogeneous information fusion, which is mainly due to use of XML technologies. Because all layers are involved when a request is sent from a cloud client to a cloud computing application, upper layers must rely on lower layers to process cloud resources over the Internet. Web Data layer is widely used to distribute users regarding changes of contents at some SaaS website. Web API layer is used to facilitate data exchange between cloud computing applications and allow the creation of new applications. The Web Interaction layer supports interactive web technologies. The Web Presentation layer provides independence from differing data representations by translating the format for a specific cloud client from an application format to a valid markup language. Fig. 2 shows the semantic structure between XML and Web 2.0 Mashups as a UML class diagram. The UML class diagram has as goal to give a graphical overview of the domain concepts and the relations among them. The components of Web 2.0 Mashups include Cloud Resource, Web Interaction, and Web Presentation. There are two primary Cloud resources, namely data resource and service resource. Web Data is a typical cloud data resource, while Web API is a typical cloud service resource. There is dependency relationship, annotated with a "core technology" stereotype, from Web 2.0 Mashups to XML. Its semantic indicates that XML is a core technology of Web 2.0 Mashsups and implies that a change to XML may cause a change in Web 2.0 Mashups.

Web presentation
Web presentation technologies are mainly to provide a valid markup language for a specific cloud client. Markup languages include HTML, XHTML, XForm, KML, WML, and VoiceXML. These presentation markup languages are all based on XML standard to facilitate visually precise in Web 2.0 Mashups except for HTML. There are a number of different approaches in which data can be extracted from web pages to facilitate the reuse of data [Varlamov and Turdakov (2016)]. Tab. 1 lists some of the most popular presentation markup languages.

Web interaction
Cloud computing applications require more interactive web technologies than traditional Web systems. AJAX (Asynchronous JavaScript and XML) is not a new technology or language, but a new framework that combines various existing technologies, including XML, XHTML, XMLHttpRequest, and JavaScript. With the promotion of the XML and JavaScript, the AJAX framework becomes a basic ingredient in web 2.0 applications. It supports a more interactive navigation of Web 2.0 application has enhanced online collaboration and sharing information among users. Tab. 2 lists related technologies adopted by AJAX.  [Máchová, Hub and Lnenicka (2018)]. In recent years, many countries are actively providing OGD to their citizens to promote the reuse of OGD [Vracic, Varga and Curko (2016); Nascimento, Da Rocha and Garcia (2018) Linked Data is simply about using the Web to create typed links between data from different cloud resources [Bizer, Universität, Heath et al. (2009)]. The term Linked Data was coined by Tim Berners-Lee in the following principles [Berners-Lee (2009)]: (1) Use URIs as names for things.
(2) Use HTTP URIs so that people can look up those names.
(3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
(4) Include links to other URIs so that they can discover more things.  [Mital, Pani, Damodaran et al. (2015)]. In the LOD Cloud, there are many well-known LOD datasets, such as DBpedia, FOAF, W3C, GeoNames, etc. DBpedia is the largest LOD dataset containing extracted data from Wikipedia. It contains about 3.4 million concepts described by 1 billion RDF-based relationships. Many novel studies and applications are developed using DBpedia to get more useful information to enhance their usability and innovation [Zhu, Ren, Liu et al. (2016)].

Web API
Web 2.0 Mashups use Web API technologies to facilitate data exchange between applications and allow the creation of new applications. Most of the Web APIs are constructed based on the Web Services architecture [Sun, Rossing, Sinnema et al. (2010)].
They hide the detailed Web Services protocols from the developers and make it easier for the developers to use. Web service composition is a new paradigm to develop Web-based systems [Jeong, Rana, Hsu et al. (2016)]. The following are five main types of Web APIs. The detailed comparisons of the five Web APIs are shown in Tab. 3.

XMLHttpRequest
XMLHttpRequest is not a web service technology but a Web API that provides scripting languages to transfer XML or other text data between a client and a server. It is used to communicate asynchronously with a server-side component and dynamically update the source of an HTML page based on the response data. The data returned from XMLHttpRequest calls are often provided by third-party database servers. Besides XMLbased format, XMLHttpRequest can be used to process data in other formats such as HTML, JSON, or plain text. For example, Google Map API uses Javascript XMLHttpRequest objects to send HTTP requests and receive responses from server. Google Maps API is one of the most widely known Web API among current Web 2.0 Mashup applications.

XML-RPC
XML-RPC is a remote procedure calling employing HTTP as the transportation protocol. It is a protocol for exchanging XML-based messages in a distributed environment. XML-RPC provides a standard for heterogeneous programs to communicate with each other regardless of their implementation language and system platform. An XML-RPC message is an HTTP-POST request. A request executes on the server and the body of the request is coded in XML-based format. The response value of the request is also formatted in XML.

Simple object access protocol
Simple Object Access Protocol (SOAP) is another communications protocol for Web services that emerged immediately after the XML-RPC. It is developed to address some of the limitations of XML-RPC, including only for RPC over HTTP, not easily extendable, and no support WSDL. SOAP adopts XML Base [Marsh (2001)] to determine a base URI for relative URI references used as values in message items. A SOAP binding describes how an underlying protocol is used to transport SOAP messages. Most of the current Web Services adopt SOAP over HTTP.

Representational state transfer
Representational State Transfer (REST) is a style of software architecture used to describe how Web resources, such as web service, web page, text, database, or website, are defined and addressed. REST is often used in a looser sense to describe service interfaces. Many of current Web services are developed on REST style, called RESTbased Web services. The main advantages of REST-based Web services are lightweight, human readable, and easy to build [Barbaglia, Murzilli and Cudini (2017)]. (2017)] is a conforming SPARQL protocol service, which enables users to query a RDF-based dataset with the SPARQL language. The SPARQL Annotations in WSDL (SPDL) provides a specification for allowing SPARQL query indicates a specific URL associated with parameters to invoke web services and bind the returned information to SPARQL results. Additionally, some existing projects [Sbodio, Martin and Moulin (2010)] provide SPARQL Web Service or APIs for different programming languages to invoke SPARQL query.

Web 2.0 mashups as a service
The components of WMaaS, shown in Fig. 1, are described in the previous section. This section describes how WMaaS can be as a service model, and associated with existing service models of cloud computing. This study presents a stack framework, shown in Fig.  3, to locate and represent relevant service models of cloud computing. The layers IaaS, PaaS, and SaaS represent current service models of cloud computing. The top layer is cloud devices that are increasingly connected to the cloud SaaS applications. Therefore, the same web content needs to be rendered differently on various cloud devices. Heterogeneous issues span all the upper three layers. There are four characteristics of cloud computing lead to heterogeneous issues. The WMaaS are used to cope with these heterogeneous issues summary in Tab. 5.  The WMaaS is an extension of the Web 2.0 Mashups in which cloud resources are given well defined meaning, better enabling SaaS, PaaS, IaaS, and various cloud participants to work in cooperation. Additionally, the WMaaS can combine with existing cloud service models, SaaS, PaaS, and IaaS to facilitate the development of cloud computing applications. Fig. 4 shows the service-oriented architecture that is associated with various cloud participators and different cloud computing service models.

Ubiquitous location-based service system based on WMaaS
Web 2.0 Mashups use Web API technologies to facilitate data exchange between applications and allow the creation of new applications. Most of the Web APIs are constructed based on the Web Services architecture [Sun, Rossing, Sinnema et al. (2010)]. They hide the detailed Web Services protocols from the developers and make it easier for the developers to use. Web service composition is a new paradigm to develop Web-based systems [Jeong, Rana, Hsu et al. (2016)]. The following are five main types of Web APIs. The detailed comparisons of the five Web APIs are shown in Tab. 3. This section demonstrates the feasibility of WMaaS, we implemented a Ubiquitous Location-based Service System (ULSS) that is a cloud computing application to provide a continuous and location-based schedule information for personal needs. The main components of ULSS include: Location-Based Service Platform (LBS Platform), GPS Network, and Cloud Client. The Location-based Service Platform consists of the ULSS Portal Website that is deployed in Hadoop cloud computing environment to community inquiry for validated members. In the ULSS, GPS information are collected through the mobile phone and transferred to the Location-based Service Platform for storage. The dataflow-oriented architecture of ULSS is depicted in Fig. 5. The related technologies of WMaaS are used in ULSS summary in Tab. 6. The schedule and GPS location information are described in XML format for data sharing and transformation.

Atom
Web Feed (cloud resource) The Google Calendar information is described in Atom format to support to Google Calendar API. XML Open Data (cloud resource) The PM2.5 Open Data contains real-time PM 2.5 information presented in XML, which is provided by the Open Government Data of Taiwan. RDF Linked Data (cloud resource) The introduction information of Taipei city presented in RDFbased Linked Data, which is offered by DBpedia. AJAX Web Interaction A sample client was built using XHTML and AJAX to query the Google Calendar, and display the location information in real-time.

XHTML Web Presentation
The location-based service information is rendered in XHTML for desktop PC. Hadoop Cloud Computing is a PaaS for developing web applications that provides Web APIs to retrieve the various services, including data store, Google Accounts, Google Map, Google Calendar and Google email. Hadoop cloud computing environment also supports a web-based administration console for the SaaS developers to easily build, maintain, and extend their web applications. Location-Based Service Platform (LBS Platform) is a website that provides the dynamic schedule information transcoding and inquiry. It uses XML-based documents and web services technologies to facilitate reusability of schedule information. The LBS Platform builds in a Hadoop cloud computing environment, which is composed of Transcoding Agent, user information and schedule information. The Transcoding Agent listens to the cloud client request to acquire schedule information from the data store. It then converts the schedule information into an XML-based document that is accepted by a cloud client. Additionally, The Transcoding Agent can call the Web API and SPARQL endpoint to access the Open Data and Linked Data. GPS Network is composed of GPS satellite, GPS receiver, and Wi-Fi/GPRS base station. Mobile phone serves as a GPS receiver to receive location information form GPS stations.
The mobile phone is connected to the ULSS and the location information is sent through the Wi-Fi/GPRS network. Cloud Client interacts with the Location-Based Service Platform through internet connections to retrieve the schedule information. Various client devices, including desktop PC, personal digital assistants (PDA), mobile phone, and notebook, are increasingly connected to the Internet. The same schedule information needs to be rendered differently on various client devices.

Figure 5:
The dataflow-oriented ULSS architecture The following steps explain the message flow illustrated in Fig. 5: 1. Each user has a personal mobile device associated with a unique user ID as the personal identification. 1.1 Mobile phone receives the location information form GPS satellites. 1.2 The location information is encoded to an XML-based document, as shown in Fig. 6, and then the XML-based document is imported into LBS Platform through the WiFi/GPRS base stations.

The LBS Platform filters the personal location information, and then calls Google
Calendar API to save them into personal Google calendar. 3. The step is a pull-based interaction scheme that accomplishes the following tasks: 3.1 The various cloud devices, such as mobile phone or desktop PC, can send a request to LBS Platform with the user ID to browse the personal schedule information. 3.2 The LBS Platform invokes Transcoding Agent to acquire the schedule information form data store, and then converts this information into an XML-based schedule document. 3.3 The Transcoding Agent calls Google Calendar API to acquire the personal Google calendar.

The Transcoding Agent calls Google Map API to get Google Map information.
3.5 The Transcoding Agent accesses the PM2.5 Open Data through calls the Web API of Taiwan's Open Government Data. And then, it parses the PM2.5 Open Data to extract local PM2.5 information based on the current location of user. 3.6 The Transcoding Agent accesses the Taipei city Linked Data through calls the SPARQL endpoint of DBpedia. And then, it parses the RDF-based Linked Data to get the introduction of Taipei city. The above information, including personal schedule, local PM2.5 information, Taipei city introduction, and Google map, will be converted to various XML-based documents, such as Atom (shown as Fig. 7) or XHTML document, to display in mobile phone and desktop PC, respectively. <?xml version="1.0" encoding="utf-8" ?> <schedule id="mu78919011> <user id="wp321892"> <name> Yii-Ching Shue</name> <title>GPS satellite positioning</title> <item> <subject> Day Trip</subject> <time>Fri, 06 April 2018 16:12:16 GMT <time> <device> iPhone (3289-329-7810A)</device> <longitude>120.429481</longitude> <latitude>23.702141</latitude> ….. </item> </employee> </schedule>

Evaluation
This section evaluates the Ubiquitous Location-based Service System (ULSS) for Web 2.0 Mashups as a Service (WMaaS) against our requirements. The requirements include heterogeneity and performance, which have been mentioned in Section 1.

Heterogeneous evaluation
The heterogeneous assessment contains Cloud Resources independence, Web Presentation independence, and Web Interaction independence. The study adopts WMaaS as a generalized architecture of cloud computing applications. Tab. 6 shows significant comparisons between the Web 2.0 technologies and WMaaS based on the heterogeneity. The independence of the platform and the hardware allows for a lightweight and simplified evolution of more complex web-based applications in the cloud computing ecosystem. Furthermore, the constructs in the proposed WMaaS and ULSS are not specifically designed to match one particular cloud computing application. Therefore, they can support the heterogeneity to develop various Web 2.0-based cloud computing applications.

Representational state transfer
This section presents a preliminary experiment for evaluating the performance of the ULSS based on Hadoop cloud computing environment. This investigation employs an Hadoop Distributed File System (HDFS) as the file system, which can be set up to generate duplicates automatically, thus minimizing the risk of data loss. As shown in Fig.  8, a Hadoop/Spark cluster is composed of a master node and six data nodes. Each node consists of Intel core i7-8700 CPU, 32 GB memory, and 2 TB hard disk. HDFS is used for the cluster file system. The Cluster Manager is the resource managers responsible for deploying the resources. Standalone resource manager is built in Spark and the developer can also choose other resource managers according to requirements. The study performs implementation and testing in three different cloud cluster computing environment, including Spark Standalone, Spark on YARN and Spark on Mesos. The same experiment will be executed on the Spark Standalone, Spark on YARN and Spark on Mesos, respectively. This experiment evaluated the ULSS as a personal schedule broker that processed data size from 1 GB to 7 GB. Fig. 9 shows the integrated test results obtained from Spark on Standalone, YARN, and Mesos, which indicate that the ULSS under YARN mode works more efficiently than either Standalone or Mesos. This is mainly because that YARN can achieve better performance than Spark Standalone in resource scheduling, which provides different schedulers for selection, such as Capacity Scheduler and Fair Scheduler. Moreover, YARN is suitable for running with large numbers of nodes and highly complex data. Mesos is mainly responsible for providing proper resources for the assigned task, which will be used by the original application to run executor. Therefore, Mesos can be adopted to run multiple computing services. It can allocate proper sources dynamically through fine-grained mode, thus avoiding idle allocated resources. Notably, the threshold for Spark was about 5GB. When dataset size was lower than the threshold, the computing time is significant linear trend in the size of dataset. Conversely, when the dataset size increases exceeded this threshold, computing time increased very rapidly.

Conclusion and future work
This study proposed a novel Web 2.0 Mashups as a Service, called WMaaS. The WMaaS is a fundamental cloud service model that is developed based a XML-based Mashups Architecture (XMA) to remove the heterogeneous issues of cloud computing. Additionally, WMaaS can also be associated with existing service models, SaaS, PaaS, and IaaS to facilitate organization monitoring and end user needs development. The main purpose of this study was to investigate how Web 2.0 Mashups technologies can be used to develop a novel service model of cloud computing ecosystem. This study argues that Web 2.0 Mashups can be adopted as a common scheme to integrate cloud resources uniformly using a fundamental cloud service model. This study also demonstrates XML is a core technology of Web 2.0 Mashups. While, the main limitation of XML is that it emphasizes syntax and format rather than semantics and knowledge. XML provides an application-independent and syntactic structure for describing data and resource. Even though XML has the advantage of surface syntax for structured Web Data and Web APIs, it lacks the computer-interpretability to support knowledge representation for organizational and end user computing applications development. One future work is to investigate how to integrate Semantic Web technologies [De Vocht, Softic, Verborgh et al. (2017); Selvan, Vairavasundaram and Ravi (2019)], such as RDF Schema, OWL and Ontology, into WMaaS to facilitate the development of intelligent cloud computing.