Review of the Complexity of Managing Big Data of the Internet of Things

,


Introduction
The fast-developing and expanding area known as the Internet of Things (IoT) [1][2][3] involves expanding the Internet beyond such standard devices as computers, smartphones, and tablets to also include the connection of other physical devices and objects.This allows for a variety of devices, sensors, etc. to be monitored and controlled, and to interact and communicate via the Internet.This means that an abundance of opportunity for brand new and revolutionary types of services and applications arises.As a result, we are now witnessing a technological revolution where millions of people are connecting and generate tremendous amounts of data through the increasing use of a wide variety of devices.These include smart devices and any type of wearable that are connected to the Internet, powering novel connected applications and solutions.The cost of technology has sharply decreased making it possible for everybody to access the Internet and to gather data and an abundance of real-time information.
One immediate consequence of this revolutionary emergence of novel technological opportunities is the urgent need for the development and adaptation of other related areas to further enable the development of the IoT field.Thus, new words, as well as new expressions, have started to emerge, such as Big Data [4,5], cloud computing [6], and Data science.Data science has been defined as a "concept to unify statistics, data analysis, machine learning and their related methods" to "understand and analyse actual phenomena" with data [7,8], and there is now a strong demand for professional data scientists in a multitude of sectors [9][10][11][12].
This article aims at providing a review of IoT related surveys in order to highlight the opportunities and the challenges, as well as the state-of-the-art technologies related to Big Data.There will be a particular focus on how to address the arising problems of managing the ensuing increased complexity.Since it is such a complex area, we have divided the Big Data procedure into several different stages to establish the most important points in each, while highlighting to the reader the most relevant papers related to every stage.Due to the complexity of managing Big Data, we have created separate sections in regard to the aforementioned stages of Big Data procedure.Our contribution explicitly indicates the advantages of every stage in the knowledge discovering procedure in contrast to approaches that offer more general visions.The advantage of this proposal is to be able to understand as well as analyse the challenges and opportunities in every particular phase.The remainder of this article is structured as follows: first the next section discusses a set of general approaches to handle the complexity of managing Big Data in the context of the IoT as well as the future trends in the development of these approaches; then a section follows that discusses the knowledge discovering procedure in data gathered from a large number of diverse devices in the context of the IoT; finally, we provide a conclusion that summarises the article and points out major future trends.

The Internet of Things and Complexity
Handling: Architectures for Big Data The Internet of Things (IoT) paradigm has brought a great revolution to our society [13][14][15].It is a technology that makes our world better.It allows us to get information about the physical environment around us, and from this data valuable knowledge can be inferred about how the world works.This knowledge enables the deployment of new real-world applications, and it makes it easier for smart decisions to improve the quality of life of the citizens of our society.There are many examples of how this novel technology runs.The smart city concept is a representative use case, where many applications have been developed for its ecosystem [16][17][18][19].
An important source of complexity within the IoT paradigm comes from the great amount of data collected.In most cases, the data also need to be processed in order to be converted into useful knowledge.
In view of the recent proposals on how to handle the complexity of Big Data, there are three general approaches to carry out the ensuing very intensive data processing: (A) local processing; (B) edge computing; and (C) cloud computing.Figure 1 shows a schematic overview of these approaches, and Table 1 summarises a representative set of ways and aspects of handling the complexity arising from the IoT.Table 1 also provides references to corresponding papers, categorised under the headings of the three general approaches mentioned above.In the following subsections, brief descriptions of each of these approaches are presented, and finally their main future trends are introduced.

Local
Processing.This approach basically consists of processing the data where the data is collected.In this way, no raw data need to be communicated to remote servers.Instead, only the useful and relevant information is centralised to make smart decisions [20,21].In addition, deploying the firstlevel intelligence closer to the sensors produces an increase in the overall energy efficiency and significantly reduces the communication needs of many IoT applications.
This approach develops the concept of 'smart sensor, ' which was initially defined as 'smart transducer' [22].A smart sensor is a sensor with computing and communication capabilities to make computations with the acquired data, make decisions, and store information for further use and perform two-way communications [23].Smart sensors are becoming integral parts of intelligent systems and they are indispensable enablers of the IoT paradigm and the corresponding development of advanced applications.A typical example of these developed sensors is the 'smart wearable.' This device can acquire several biosignals, process them, show elaborated information to the user, and send the relevant information to, for example, external platforms for medical supervision [23][24][25].Other important applications come from the logistics [26] and industrial fields [27].Indeed, the new computation and communication capabilities of the IoT paradigm allow for the implementation of intelligent manufacturing systems giving rise to the next generation of industry, the so-called 'Industry 4.0' [28].
Table 1: A representative set of ways and aspects of handling the complexity arising from the IoT, together with references to corresponding papers, categorized under the headings of three general approaches (local processing, edge computing and cloud computing) to carry out intensive data processing of Big Data.

Work Main contributions (A) Local processing
Smart sensing for IoT applications [21] Discusses emerging trends of smart sensing.Sensor Fusion and Smart Sensor in Sports and Biomedical Applications [24] An overview of smart sensors and sensor fusion.
High-level modelling and synthesis of smart sensor networks for Industrial Internet of Things [27] Efficient design process and methodology for complex industrial applications.
Smart Sensing Devices for Logistics Application [26] Analysis of the logistics sector and Cyber Physical Systems (CPS) as smart connected solutions.Intelligent Manufacturing in the Context of Industry 4.0: A Review [28] Review of key technologies such as the IoT and cyber-physical systems.

(B) Edge computing
Edge Computing: Vision and Challenges [34] Challenges and opportunities in the field of edge computing are described.Collaborative Working Architecture for IoT-Based Applications [20] Network design, which combines sensing and processing capabilities based on the MCC paradigm.
IoT-Based Computational Framework [25] Distributed framework based on the IoT paradigm for real-time monitoring.Edge Computing [35] Analysis of the edge computing paradigm.Fog Computing [48] The Fog Computing framework.
Secure Multi-Tier Mobile Edge Computing Model [49] Formal framework to handle the security level of edge computing environments.
Mobile Edge Computing [38] Analysis of opportunities, solutions, and challenges of the MEC paradigm.Cloudlets [39] Introduction to the cloudlet concept for offloading computations.Future Edge Cloud and Edge Computing for Internet of Things Applications [40] Discussion of Edge Cloud and Edge Computing research efforts.

(C) Cloud computing
A Smart Sensing Architecture for Domestic Monitoring [50] Integrated sensor network deployment with advanced Cloud Computing Data Mining algorithms.
IoT-as-a-Service [43] Strategy for evaluating the information quality in delivering IoT-as-a-Service.The shift to Cloud Computing [41] This paper analyses the impact of the shift to the Cloud-based model.
Accessibility analysis in smart cities [46] Comprehensive system for monitoring urban accessibility in smart cities.Big data analytics framework for smart cities [47] Smart City Data Analytics Panel for Big Data analytics.Sensing and Actuation as a Service Delivery Model [44] A novel system model for Sensing and Actuation as a Service (SAaaS).User Quality-Of-Experience and Service Provider Profit in 5G [51] The Quality-of-Experience (QoE) and the Profit-aware Resource Allocation problems are analysed.
Orchestrated Platform for Cyber-Physical Systems [45] Discussion on the scalability of the sensor data back-end and the predictive simulation architecture for CPS.
In these environments, network virtualization plays a significant role in providing flexibility and better manageability to Internet [29].This is a way for reducing the complexity of the infrastructure since network resources can be managed as logical services, rather than physical resources.This feature enables us to implement smart scheduling methods for network usage and dataflows routing from IoT applications [30].
In order to properly carry out this resource management, network performance monitoring needs to be performed in effective and efficient ways.However, it remains a challenge for network operators [31] since active monitoring techniques used to dynamically acquire it can introduce overheads in the network [32].In general, existing methods are hard to use in practice and further research is needed in this area.Nevertheless, a promising idea to address this challenge consists in reducing the data measurement by implementing intelligent measurement schemes based on inference techniques from partial direct monitoring data [33].Complexity 2.2.Edge Computing.Edge computing is a novel paradigm which has spawned great interest recently.It consists of the deployment of storage and computing capabilities at the 'Edge' of the Internet.The 'Edge' of the Internet can be defined as the portion of the network between sensors or data sources and cloud data centres [34].The edge computing paradigm aims at deploying computing, storage, and network resources in this portion.The physical proximity of the computing platforms to where the data acquisition happens makes it easier to achieve lower end-to-end latency, high bandwidth, and low jitter to services [35].
There are several ways to implement edge computing that have in turn led to different approaches, such as Fog Computing, Mobile Edge Computing (MEC), and Cloudlet Deployment.Fog Computing consists in using the network devices such as routers, switches, and gateways as Fog Nodes to provide storage and computing resources [36].In addition, network virtualization has significantly contributed to developing this paradigm by considering the fog devices as virtual network nodes.This trend increases the deployment flexibility of Fog Computing services and their integration with mobile devices and 'things' [37].MEC is a novel paradigm based on deploying cloud computing capabilities in the base stations of the telecom operators [38].Finally, Cloudlet Deployment consists in the same concept as Cloud Computing, but without the Wide Area Network (WAN) inconveniences.The servers are installed within the local networks where the data sources are connected.These servers are known as cloudlets [39].
Applications for edge computing, such as in Virtual Reality and Gaming Applications [40], cannot tolerate high latency, or its unpredictability.This is something that remote cloud servers cannot deliver.

Cloud Computing.
The Cloud Computing paradigm is one of the most disruptive technological innovations in the last few years.It makes available to anyone a flexible amount of computing resources under per-use payment methods, the so-called 'as-a-service' model.Currently, more and more software and hardware solutions are redesigned for this cloud paradigm [41].
The cloud computing model favours the development of large-sized data centres where the resources are optimised through virtualization and efficient management systems.This technology gives the IoT applications the possibility to work in different environments in a very agile way using the same infrastructure [42].In such a way, combining the cloud computing paradigm with IoT forms a new type of distributed system able to provide IoT-as-a-Service (IoTaaS) [43].This concept allows for the integration of powerful computing resources with different types of devices such as sensors, actuators, and other embedded devices to deliver advanced services and applications based on the gathered data.A particular instance of this idea is the Sensing and Actuation Cloud where the connected IoT devices are mainly sensors and actuators [44], or the Cloud Cyber Physical Systems (CPS) composed of sensors or sensor networks [45].
There are a great variety of successful examples of this trend in many areas, where the data are analysed in the cloud through Big Data and data mining methods to infer valuable knowledge from them and deliver rich and smart services to the stakeholders.For example, the smart city concept, mentioned above, is in part made possible by a centralised cloud-based data analysis and service provision [41,46,47].
In addition, a combination of these options can be designed taking several aspects into account, such as power consumption, communication networks, and the availability of computing platforms.Dynamic solutions can easily adapt to the more favourable approach to better handle the complexity and meet the operation constraints.

Future Trends.
Regarding the future trends of the developments of these three general approaches to intensive data processing of IoT related Big Data, there are developments at several fronts.The following is a summary of those most relevant.
When it comes to local processing, the efforts are directed towards the continuous improvement of smart sensor devices.We can distinguish several research lines here.One is the efforts to increase the performance of the devices while simultaneously reducing their power consumption.Another is the integration of multiple sensing modalities on the same chip.Still another is the efforts directed towards the improvement of the methods employed for the extraction of useful information from the raw data [21].
Edge computing has a promising future since it decentralises the computing power along the network and produces clear benefits when it comes to response time and reliability [34].The research lines in this field aim at reaching a smooth engagement with the IoT ecosystem, mainly by reducing the management complexity of dispersed edge resources and developing mechanisms to maintain the security perimeter for the data and applications [49].
The cloud computing paradigm has triggered a strong growth of computing services around the world.For this reason, there is intensive ongoing research on expanding cloud services and solutions to new fields of application.These tasks seek to simplify business and make services easier for stakeholders.In this way, the new 5G protocol will facilitate access for services and applications in the cloud improving the Quality-of-Experience [51].

Knowledge Discovering Procedure
In Figure 2, a classical procedure of discovering knowledge from the data gathered from a large number of diverse devices is depicted.In this figure, we get an overview of all the stages involved in such a process.There are many challenges involved in these stages that will be described next.

IoT Data
Gathering.The gathering of data for IoT architectures involves collection from different sources like social networks, the web, various devices, software applications, humans, and not the least various kinds of sensors.In addition to physical sensors, there are also virtual sensors that are created by the combination and fusion of data from different physical sensors in the cloud [52].When it comes to the gathering of data from sensors, not only the raw sensor data are collected and stored, but these are also often linked to, for example, relevant contextual information, which increases the value of the data [53].All these different sources engender large amounts of various types of data that, of course, also increases the requirements for storage capacity.The increasingly affordable storage resources that have recently become available mitigates this problem to some extent though.
Sensor networks are central for realising the IoT and in order to handle large amounts of polymorphous, heterogeneous sensor data on a large scale.Very Large-Scale Sensor Networks are employed using Cloud Computing [54].Some of the main challenges regarding Very Large-Scale Sensor Networks are to handle the sensor resources and the computational resources and to store and process the sensor data.
Table 2 provides references to papers focused on the gathering of data in the context of the IoT.

Data Cleaning and Integration.
A consequence of the way information is gathered through various sources and devices within IoT is that the information varies broadly in structure and type.This leads to a need for integration, which can be defined as a set of techniques used to combine data from disparate sources into meaningful and valuable information.
Integration is one of the most challenging issues of Big Data, which is also associated with one of the most difficult Vs of Big Data, i.e., the variety of data.Table 3 shows a summary of papers that are focused on the problem of variety of information in Big Data.
Moreover, given the current context in which companies are organized, it is not enough to work with internal, local, and private databases.In most cases, there is also a need for the World Wide Web where many diverse databases and other data sources must interact and interoperate.This circumstance leads us to concepts such as heterogeneity and uncertainty.
Table 4 summarizes papers that deal with integration by means of a diversity of techniques and methods like XML, ontological constructs from knowledge representation, uncertainty, and data provenance.

Data Mining and Machine Learning.
As more devices, sensors, etc. generate large amounts of data within the IoT, the question arises whether there are possibilities of finding hidden information in that data.
Data mining is a process that detects interesting knowledge from information repositories.This process is partly based on methods derived from modern machine learning algorithms adapted to fit Big Data and that extracts hidden information from, e.g., databases, data warehouses, data streams, time series, sequences, text, the web, and the large amount or valuable data generated by the IoT.Data mining aims at creating efficient predictive and descriptive models of large amounts of data that also generalize to new data [78].It includes methods such as clustering, classification, time series analysis, association rule mining, and outlier analysis [79].The precise choice among diverse data mining and machine learning techniques often depends on the taxonomy of the dataset.
Clustering includes unsupervised learning and uses the available structure to group data based on various kinds of similarity measures.Some examples of clustering methods are hierarchical clustering and partitioning algorithms, e.g., K-Means.
Classification is the process of finding models/functions describing classes that allow the prediction of class membership for new data.Some examples of classification methods are the K-Nearest Neighbour algorithm, Artificial Neural Networks, Decision Trees, Support Vector Machines, Bayesian Methods, and Rule-Based Methods.
In time series analysis meaningful properties are extracted from data over time, and in association rule mining, association rules are detected based on attributevalue conditions that are found frequently in the dataset.
Outlier analysis detects patterns that differ significantly from the main part of the data.The methods used are based on properties such as the density distribution or the distances between the instances in the data.Describes an approach based on the cloud for worldwide implementation of Internet of Things.[56] Health monitoring and management using Internet-of-Things (IoT) sensing with cloud-based processing: Opportunities and challenges Emphasis on the opportunities and challenges for the IoT and its future perspective in the health care area.
[  In this paper, it is indicated how beneficial Deep Learning could be for several aspects of Big Data pattern recognition, analytics, semantic, etc.
[63] On the use of MapReduce for imbalanced big data using Random Forest In this experimental study, the performances with Random Forest classifier and MapReduce scheme have been used in order to deal with Imbalanced dataset.
Table 5 provides a summary of, and references to, papers focusing on machine learning and data mining in the context of Big Data.

Deep Learning.
In recent years, deep learning has become an important technology for solving a wide range of machine learning tasks [85].There are applications for natural language processing [86], signal processing [87], and video analysis that allows for the achievement of significantly better results than the state-of-the-art baselines.Also, deep learning is a very useful tool for processing large volumes of data [62].Because of high efficiency of processing data obtained from complex sensing environments at different spatial and temporal resolutions, deep learning is a suitable tool for analysing real-world IoT data.According to Gartner's Top 10 Strategic Technology Trends for 2017 (https://www.gartner.com/smarterwithgartner/gartners-top-10-technology-trends-2017/),deep learning and IoT will become one of the most strategic technological two-way relationships: from the IoT side there are large volumes of data produced that require advanced analytics offered by the deep learning side.A wide range of deep learning architectures [88] finds applications for processing the data from IoT environments: convolutional networks for image analysis, recurrent networks for signal processing, autoencoders for denoising, feed forward networks for classification, and regression.Figure 3 represents a general architecture of deep learning.
[65] Answering queries using views: A survey This paper presents a survey of important methods that are employed to answer queries using views.
[66] MiniCon: A scalable algorithm for answering queries using views In this paper a survey of methods for efficient and comprehensive answering of queries using views is presented.
[67] XQuery: the XML query language This paper introduces the XML query language XQuery.
[68] From semistructured data to XML: Migrating the Lore data model and query language This paper discusses the adaptation to XML of databases and semistructured languages.
[69] Querying XML streams In this paper a construct called TurboXPath, similar to x-scan, is used for processing hierarchical "native XML" data pages written to disk.
[70] Semantic integration: a survey of ontology-based approaches This paper provides a survey of ontology-based approaches to semantic integration.[71] Learning to map between ontologies on the semantic web This paper presents assisting tools for the mapping between ontologies on the semantic web.
[72] Containment of conjunctive queries on annotated relations This paper indicates the relationships between different provenance formalisms.
[73] Perm: Processing provenance and data on the same data model through query rewriting This paper presents a provenance model similar to that of semi-rings focusing on supporting other operators such as semi-joins.
[74] Google fusion tables: web-centered data management and collaboration A presentation of a cloud-based system that facilitates the integration of data on the web.Datasets, e.g. in the form of CSV files or spreadsheets, can be uploaded to the system and made public or shared with collaborators.
[75] Global detection of complex copying relationships between sources Methods that are developed to detect copying relationships between sources in order to find the number of independent occurrences of facts are discussed in this paper.
[76] Crowdsourcing systems on the world-wide web A survey to get a global picture of crowdsourcing systems on the Web is presented in this paper.
[77] A Novel Multidimensional Approach to Integrate Big Data in Business Intelligence In this paper, an approach for integrating different formats into the recent RDF Data Cube format is presented.The approach is based on a MapReduce paradigm.They offer different execution models as standalones or utilize high-performance computing based on, e.g., Hadoop, or Spark Cluster that allows a reduced time of computations.The frameworks have been widely compared and the reviews can be found online (https://dzone.com/articles/8-best-deeplearning-frameworks)(https://www.exastax.com/deep-learning/a-comparison-of-deep-learning-frameworks/).It should be noticed that these frameworks implement a processing model where the data are transferred to a server performing the analysis and in a final stage the response is returned.This model is subject to latency that could not be acceptable in some applications where there are requirements for high reliability, like, for example, when it comes to autonomous cars [89].Thus, if efficiency constraints require real-time data processing, then a particular implementation of the algorithm is made on a local node.In its basic setting, this solution does not allow the use of information from other sources.An example of on the node-processing has been presented in [90], where on the node spectral domain preprocessing is used before the data is passed onto the deep learning framework for Human Activity Recognition.
For the IoT the deep analytics are made on large data collections and are usually based on creating more descriptive features of processed objects.For example, in temporal data processing for indoor location prediction [91], a Semisupervised Deep Extreme Learning Machine algorithm has been proposed that improves the localisation performance.The wireless positioning method has been improved with the usage of the Stacked Denoising Autoencoder and that also improves the performance by creating reliable features from a large set of noisy samples [92].The prediction of home electricity power consumption has been analysed with a deep learning system that automatically extracts features from the captured data and optimises the electricity supply of the smart grid [93].
In Edge Computing with the analytics performed by a deep learning cluster [94], the resource consumption has been efficiently reduced [95].Convolutional neural networks with automatically created features appeared to be a very good solution for privacy preservation [96].Also in the security domain, deep learning finds many applications, e.g., it allows the construction of a model-based attack detection architecture for the IoT for cyber-attack detection in fog-tothings computing [97].
Video analysis integrated in IoT networks is strongly supported by neural networks, e.g., deep learning-based visual food recognition allows for the construction of a system employing an edge computing-based service for accurate dietary assessment [98].RTFace, a mechanism for denaturing video streams, has been based on a Deep Neural Network for face detection [99].It selectively blurs faces and enables privacy management for live video analytics.
3.5.Classification, Prediction, and Visualization.This section discusses the final stage in the chain of the "Procedure for Knowledge Discovery," which is the obtainment of the final knowledge extracted from the raw data.
When employing machine learning methods for classification and prediction, it is important to use methods with good ability to generalize.The reason for this is that when we apply any of the aforementioned techniques, and after they have been trained on the original data, we want them to make good classifications and predictions of novel data rather than on the data used for training.
After machine learning methods have been applied, it is crucial to know how to interpret their outputs and understand what these mean and how they improve the knowledge in each application area.To that end, visualization methods are employed.Such methods are widely used within Big Data scenarios as they are very helpful for all types of graphical interpretations when the Volume, Variety, or Velocity are complex.In Table 6, we present a summary of, and referral to, papers that deal with visualization.

Conclusion
As indicated by the journal articles and the conference papers we have reviewed in this article, the complexity of Big Data is an urgent topic and the awareness of this is growing.Consequently, there is a lot of research carried out on this, and we will in all likelihood find more and more progress in this field during the next few years.
Additionally, a key issue that we really want to emphasize in this study is the aspects related to Big Data which transcend the academic area and that, therefore, are reflected in the company.An observation is that more than 50% out of 560 enterprises thinks Big Data will help them increase their operational efficiency as well as other things [60].This indicates that there are a lot of opportunities for Big Data.However, it is also clear that there are many challenges in every phase of the knowledge discovery procedure that need to be addressed in order to achieve a continued and successful progress within the field of Big Data.
As is shown in Figure 1, there are three general approaches when carrying out intensive data processing in IoT architectures: (a) local processing, (b) edge computing, and (c) cloud The main objective described in this paper is analytic methods and how are they used for Big Data, in particular, the ones related to unstructured data.
[101] Key Performance Indicators: Developing, Implementing, and Using Winning KPIs This book represents a guide with tools and procedures to discover the KPIs and how they are developed and used.
[102] The visual organization: data visualization, Big Data, and the quest for better decisions The paper describes data visualization myths, such as: that all data must be visualized, when in fact only good data should be visualized; visualization will always manifest the right decision or action; and that visualization will lead to certainty.
[103] Big Data and Visualization: Methods, Challenges and Technology Progress The paper presents applications, technological progress of Big Data visualization, and discusses challenges of it.
[104] Big-Data Visualization A special issue which focus on the current situation and new trends of Big Data Visualization.
computing.The text explained each of these approaches more in detail.
We also explained the knowledge discovery procedure by dividing it into several stages as shown in Figure 2.These steps are IoT Data Gathering, Data Cleaning, Integration, Machine Learning, Data Mining, Classification, Prediction, and Visualization.
We have also discussed that many research papers are focused on the variety of information because this is in itself, in conjunction with integration, one of the most challenging issues when it comes to the IoT.This is also the reason why it is very often also associated with one of the most difficult Vs of Big Data, which is the variety of data.
The trend for the future seems to be that more investigations will be carried out in such areas as (a) techniques for data integration, again the V of Variety; (b) more efficient machine learning techniques on big data, such as Deep Learning and frameworks such as Apache's Hadoop and Spark, that will probably have a crucial importance; and (c) the visualization of the data, with, e.g., dashboards, and more efficient techniques for the visualization of indicators.

Figure 1 :
Figure 1: A schematic depiction of three different approaches to handle the complexity of the intensive data processing arising as a consequence of the tremendous amounts of collected IoT data.

Figure 2 :
Figure 2: A classical procedure for the discovery of knowledge based on data gathered from a large number of diverse devices.

Figure 3 :
Figure 3: As can be seen in the figure, a Deep Learning Neural Network contains several hidden layers.Often the heavy computations are run on GPUs or clusters of GPU servers.

Table 2 :
The table summarizes and refers to a representative set of papers focusing on the gathering of data in the context of the IoT.

Table 3 :
The table summarizes and refers to a representative set of papers focusing on the variety of information in the context of the IoT.

Table 4 :
The table summarizes and refers to a representative set of papers focusing on Data Integration in the context of the IoT.

Table 5 :
The table summarizes and refers to a representative set of papers focusing on Machine Learning and Data Mining in the context of Big Data.

Table 6 :
The table summarizes and refers to a representative set of papers focusing on Visualization and Prediction in the context of Big Data.