Keywords

1 Introduction

Open Data can be used as fuel for the creation of smart cities [1]. Open urban data and technology can be used to develop services for citizens through open and people-driven innovation models [2]. Web applications can assist in monitoring, analyzing and visualizing social, economic and environmental phenomena [1]. Data-driven initiatives can focus on various types of value [3]. The ability to effectively and efficiently combine, link and share data will determine such data’s value [4].

Big Open and Linked Data (BOLD) is a recent and rapidly evolving field in technology driven business world. The volume of data is growing exponentially. Big Data is formed of large, diverse, complex, longitudinal, and distributed data sets generated from various instruments, sensors, Internet transactions, email, video, click streams, and other sources, whereas open linked data focusses on the opening and the combining of data. The data can be released both by public organizations and by private organizations or individuals. Big Data analytics can be used to promote better utilization of resources and improved personalization [5].

Smart cities is a concept that is gaining more attention over time. Smart cities refer to types of technology-supported innovation in urban spaces and city governments [6]. The term smart cities is rapidly replacing the original concept of sustainable city [7], however, whereas ‘sustainability’ has a clear scope, the term ‘smart’ is more loaded and unclear. Furthermore, other terms such as intelligence, knowledge, information and ubiquitous city are used [7]. What actually constitutes ‘smart’ is rather unknown [8].

While various studies have been conducted on Big Data [5] and Open Data [911], there are only few studies that combine these two concepts [e.g., 1, 12]. Existing research leaves a void in how BOLD can be used to create smart cities and how they contribute to their ‘smartness’. In BOLD, organizational implications have been given limited attention [13]. Hence the objective of this paper is to investigate the relationship between the smart cities and BOLD. The contribution of the paper originates from the integration of two strands of research in e-government, namely the open and Big Data literature and the smart city literature and by investigating what the organizational implications of BOLD are.

This paper is organized as follows. First, we define the key concepts of this study, including smart cities and BOLD. Subsequently, the research approach of this study is presented. We followed an inductive method and investigated the patterns derived from two case studies. Next we describe the inductive cases followed by a discussion of the challenges and opportunities for combining smart city and BOLD literature, and, finally, we draw conclusions about how the two research streams can be combined and what can be learned from this.

2 Research Background

In this section we provide a brief account of the concepts of smart cities (Sect. 2.1) and BOLD (Sect. 2.2).

2.1 Smart Cities

Smart cities are a response to the challenges that cities face in meeting objectives regarding socio-economic development and quality of life [14]. The smart city concept has been defined in different ways [2, 15], where definitions vary from smart urban space to environmentally healthy smart cities [7]. Others attempt to characterize smart cities and use dimension including (1) management and organization, (2) technology, (3) policy, (4) governance, (5) people and communities, (6) economy, (7) built infrastructures, and (8) the natural environment [16]. Six key dimensions for defining smart cities are a smart economy, smart mobility, a smart environment, smart people, smart living and smart governance [17]. There has been an inflation of the concept of smart cities and every ICT applications can be said to be smart. The concept of ‘smartness’ is often hardly defined and/or ill explained. A city can be called ‘smart’ “when investments in human and social capital and traditional (transport) and modern (ICT) communication infrastructures fuel sustainable economic growth and a high quality of life, with a wise management of natural resources, through participatory governance” [17, p. 70]. In our opinion, smartness does not refer to the bricks and mortar in the city, but to the mechanisms to improve the use of resources in the city. Furthermore, in our opinion a city can only become smarter when the inhabitants are empowered and thus also become smarter.

2.2 Big and Open Linked Data (BOLD)

Size is only one aspect of Big Data [18]. Big Data are commonly characterized with three or more Vs: Volume, Velocity, Variety [19], and additionally Value, Variability and Veracity [18]. Big Data and Big Data analytics refer to “datasets and analytical techniques in applications that are so large (from terabytes to exabytes) and complex (from sensor to social media data) that they require advanced and unique data storage, management, analysis, and visualization technologies” [20, p. 1166]. Big Data and Open Data are closely related, yet they are not the same. While Big Data is characterized by its size, Open Data is characterized by its free availability, although there is discussion about the level or volume that is necessary to make data big and the level of openness to deserve the name Open Data.

For the processing of data, data analytics play a role by providing deep insight and influence the decision making processes of public organizations and citizens often affecting the usage of resources in creating products and services in smart cities. Big Data Analytics are characterized by the requirement of advanced data storage, management, analysis, and visualization technologies, which traditional business analytics are not able to offer. These technologies include, among others, interfusion of various data sources, real-time analysis, online analytical processing, business performance management, data mining, machine learning, cloud computing, distributed processing, parallel algorithms, and parallel Database Management Systems (DBMS) [18].

3 Research Method

This paper aims to investigate the complementariness of the smart cities and big and open data research streams. The research approach for attaining this objective is as follows. Since this study aimed at enhancing our understanding of how research on BOLD and Smart Cities can complement each other, a qualitative, case study-based approach was used [21, 23] to inductively arrive at the results. Inductive approaches combine theory and practice and are especially appropriate for dealing with ill-defined problems and theory development. This study uses theoretical sampling to select the cases, which is appropriate for exploring a relatively new field and stimulate the extension of emergent theory and provide examples [22] as is our objective. The following criteria were used for the selection of the cases:

  • The cases employ large cities that have started Smart City initiatives and that have already established a BOLD infrastructure;

  • The cases employ predictive analytics for the use of BOLD;

  • The cases concern data on different societal topics;

  • The cases represent BOLD and Smart City initiatives in different countries;

  • The cases represent cities of different size;

  • Case study information should be available and accessible.

We opted for comparing BOLD and Smart City initiatives on different topics, from different countries, and for cities of different size, to obtain insights from the comparison of these contexts. Based on the selection criteria, we selected the following cases:

Smart Energy. This case study examined a Smart City and BOLD initiative of the city of Amsterdam in the Netherlands. Amsterdam has approximately 800,000 inhabitants. The case focused on the topic of energy saving, and more specifically, on using smart grids and open data on energy to obtain insights about how energy can be saved in the Amsterdam New West district.

Smart Mobility. This case concerned the Smart City and BOLD initiative of the city of Rio de Janeiro in Brazil. Rio de Janeiro has 6.5 million inhabitants. The case focused on the topic of mobility, and, more specifically, on using open public transport and traffic data to obtain insights in how the public transport and traffic can be improved in Rio de Janeiro.

Various information sources were used to investigate the cases. The information sources used to study the cases encompassed reports, documents, applications, websites and observations. From the case study we inductively arrived at the results by comparing the smart cities and BOLD aspects and deriving how both of them were used. The inductively derived concepts and theory can then be used together with other theories as an initial theory in further research.

4 Case Studies: Smart Energy and Smart Mobiliy

This section provides the findings from the two case studies concerning smart energy (Sect. 4.1) and smart mobility (Sect. 4.2).

4.1 Energy Savings in Amsterdam

The city of Amsterdam developed an open data portal which currently contains 416 datasets and a number of APIs. The infrastructure is available via http://www.amsterdamopendata.nl/home. The infrastructure contains data about 18 different themes, including public space, traffic, economy and energy. There is also a reference to an application which provides so-called Smart City Data (http://amsterdam.smartcityapp.nl/), for instance about energy usage, windmills and purchase prices of residences. To become a smart city, the municipality and a number of other partners founded the Amsterdam Smart City (ASC) initiative in 2009. In the ASC initiative, companies, governmental agencies, knowledge institutions and citizens collaborate to make Amsterdam a smarter city. ASC focuses on five key areas, namely smart mobility, smart living, smart society, smart areas and smart economy. The envisioned smart city is one where social and technological networks provide solutions for increased sustainable economic growth, so that the quality of living in the city improves. ASC involves more than 100 partners and more than 75 projects. The projects deal with topics such as budget monitoring, sustainable electric vehicles, air quality, parking, noise pollution, digital road management for emergency services, and energy saving and sustainability [24].

Various ASC projects are focused on energy saving. One of the ASC projects is called “City-zen - Smart Grid” [25]. The project is centered around the development of a smart grid for approximately 40,000 households in the Amsterdam New West district. In addition to the smart meters and solar panels that are already used in this district, computers and sensors are placed in the grid to monitor current and voltage. The electricity grid should reduce the number and duration of power outages, allow for feeding consumer-produced electricity back to the grid, better support electric-powered vehicles, prevent large price increases for electricity transmission and assist in active participation in sustainable energy provision [25].

Liander, the organization maintaining the infrastructure for energy supply in Amsterdam, provides open data about energy usage, including electricity and gas consumption, power outages, and smart meter usage [26]. It offers datasets on energy usage per year, per type of residence, per day, and other detailed data [27]. The data can be accessed and used free of charge. These open data offer energy monitoring opportunities for Amsterdam. For example, based on a combination of datasets from Liander and various simple data analytics, visualizations were created displaying the average monthly usage of electricity and gas among Amsterdam’s zip code regions, as well as information about the energy labels for each building. Open data from the municipality of Amsterdam regarding solar panels and regarding the potential of solar panels can be added to this [28]. The combination of these datasets may be interesting not only to the municipality, but also to companies. For instance, a home isolation company that is looking for potential customers could identify residences with high gas usage and a low energy sustainability label, since these residences might suffer from limited home isolation. The isolation company may then target promotion campaigns at these residences. Another application for a company that sells solar panels could be to approach the owners of buildings with a high electricity usage and with high potential to use solar panels, so that the buildings with most energy saving potential are addressed first.

The foregoing shows that the Smart City initiative of Amsterdam can be complemented through open data, that need to be analyzed with data analytics. However, the use of a single dataset is not sufficient to obtain useful insights for energy monitoring purposes in the city. Various datasets from different organizations need to be combined to be able to draw useful conclusions. At the same time, some of the required energy data is not yet available as open data. For instance, the Smart Grid project focused on the Amsterdam New West district. At the time that we conducted this study, open electricity usage data was lacking for certain parts of the city. To allow for the use of open data in the ASC projects, relevant datasets need to be opened and used to make the city ‘smarter’.

4.2 Mobility in Rio de Janeiro

The City of Rio de Janeiro in Brazil developed one of the biggest BOLD infrastructures in the world. It has over thirty thousand datasets and seven APIs for real time data – www.data.rio.rj.gov.br. The data is coming from three different places. First, the Center of Operations Rio was created in 2011 and aims to promote the improvement of public service delivery in real time. This includes events such as car accidents or big events that change the routine of the city (e.g., the FIFA World Cup in 2014, Réveillon 1st January and Olympics games in 2016). The second stream of data is collected by the Call Center 1746, where people complain about the public services and the situation of municipal infrastructure, and which can be used as public service maintenance planning. The third type of open data concerns general data from the city. For example, data about the weather and climate (e.g. rain, sea level, river level), health, education, social aid and transport.

The Smart City initiative of Rio de Janeiro was created to solve problems related to public transportation and traffic. With the ownership of one car for almost every two persons in the city (3 million cars for 6.5 million inhabitants), Rio is suffering from huge traffic jams all over the city. The Center of Operations has the operational control over the data collection, and collects everyday around 4 GB of data. However, planning for data collection in the long run was not its function and it did not have the expertise to do so. Therefore, a Big Data group called PENSA – Ideias Room – was created. Specialists in geography, engineering, public administration, physics and computer science started to use the data collected by the City Hall and connected it to the data from Social apps, such as Waze Social GPS.

The first result of this project was the re-dimension of the bus fleet used to transport two million people at the Copacabana beach during the Réveillon 1st January. In 2014, comparing 2013, some barrios had no bus routes, while social data collected in real time from cellphones showed that around 25 thousand people came from there to see the fireworks on the first day, which shows the misconnection of the bus routes with the number of people that wanted to make use of them. It was also found that for some routes the number of required buses was overestimated and they were not needed in reality.

The second result of the project concerned a partnership with the Waze application (http://waze.com). The Rio City Hall understood that not only the city has to be smart, but also citizens have to be assisted in making the best decisions and become smarter than they were before. A partnership with the Waze application was set up, which allows citizens to send real time information to the city about information about, for example, traffic conditions and accidents. The Rio City Hall, via the Center of Operations, also uses the application to send real time information about route changes, flood routes, traffic jams and car accidents to the citizens. The Waze application aimed to improve people’s data access, in real time, to make the best decisions, diluting the problems over the city and improving the velocity of cars and buses in the city, measured by cameras and speed traps.

The third part of the project gathered all the data and put them online on some totems (video walls) of the Rio de Janeiro City Hall. They can be used to identify the condition of the traffic and how long cars take to arrive at their destination. The place of totems was planned in accordance with the Big Data analysis from PENSA taking in consideration the data of all apps and internal datasets, looking for the majority of traffic jams (and consequently, the highest number of people on the route). On those totems, the information displayed was a mixture of real time data from social apps of cellphones and the data from Rio (GPS buses).

Fourthly, the project encompassed a partnership with the social app MOOVIT. This application allowed people that used buses in Rio de Janeiro to see, in real time, the condition of bus traffic and how long it takes the bus to arrive at the bus stop and at the destination. It uses the GPS devices that were installed on the buses.

The foregoing shows that the Smart City initiative of Rio de Janeiro can be complemented through open data. The smart city initiative requires the analysis of BOLD with data analytics. Nevertheless, collecting data is hard due to some reasons. A large number of sensors needs to be available to collect data. The sensors are used to track the traffic in real time. Software and human resources have data processing and analytics expertise need to be available. Finally, political support is necessary to gain access to open data.

5 Discussion

In this section we discuss challenges and opportunities for combining smart city and BOLD literature, using the case studies as illustrations for our arguments.

5.1 Opening of Data Sources: Primary and Secondary Measurements

In the cases, datasets originating from different (public) organizations were combined to draw useful conclusions. At the same time, some of the required data was not yet available as open data in both cases. Despite the many datasets that are already opened a main challenge is still the identification of data sources and making data available for use. The lack of willingness to share information among agencies is often a failure factor [29]. Data is easily mistaken for fact, but the raw data is often full of mistakes and have low information quality which makes it unsuitable for direct processing. In both cases the data is used for drawing conclusions at the individual level which requires that the data quality is sufficient for this. Successful open data use strongly depends on the quality of open data [30]. Big Data has an even worse annotation as the term suggests assembling many facts to create greater insight, whereas combining many sources of low information quality might result even in less quality, although not necessarily. The veracity characteristics of big data refers to the need and ability to deal with imprecise and uncertain data [18]. These problems are widely known in the literature, however, with the opening and use of data these problems may become apparent [31] as the opened data might be used for decision-making. Only if the actual user needs is known beforehand it is possible to process data in advance and make the results available.

There are primary and secondary measurements of data used in the cases. Primary measurement means that what is measured is directly measured, such as the temperature outside or number of cars passing by. Secondary measurement means that a surrogate is measured to determine the measure, like the use of gas for determining the heating of a building. The heating is also dependent on the process of consuming gas into heat which might have various degrees of efficiency dependent on the transformation process design. Primary and direct measurement is often more factual, e.g. if sensors are used to measure directly what is going on, whereas using secondary measurement might give a view which might not always be correct. Smart phones might be used as proxies for the needs, but the use of smart phones varies among populations and might not be used by the owner all the time. Furthermore, data might be subjective, and data collected from smart phones or social media might not be representative for the entire population. For being representative a random sample from a population needs to be taken that has a sufficient size and in which the measurements are independent. These conditions necessary for using statistics often do not hold.

5.2 Fusing Open and Closed Data

Primary and secondary data can both encompass open and closed data. Open and closed data are often combined in the two cases and there is a thin line between open and closed data. In one case a closed dataset was bought by the government which allowed them to open the data for use by others. Also sometimes open and closed data were combined to provide more insight. The results of these combinations are open, but the source data remains closed. There are various reasons for now opening the data including privacy and unclear ownership. A continuous scale between open and closed data emphasizes that not only open data but also closed data should be used to identify improvements. This suggests that a too narrow view on data that is open only might not result in releasing the potential of smart cities.

Furthermore, the way the data is collected is very important and can be done in various ways as systematically described using a taxonomy for collecting and opening data in Fig. 1. In this figure the source of data at the left hand side, the users on the right hand side, and the process of opening data in the middle. At the bottom data push and pull is shown. Data push concerns the collection of the data and the subsequent storing of the information in a data storage. The data is collected and pushed to the storage. Data pull concerns the delivery of data based on user requests.

Raw data can be made available directly or can be enriched or processed before actual use. Often this is necessary to ensure privacy, to enhance quality or due to other requirements. There are various options for doing this. First, data can be de-identified by stripping of certain identifiers, such as names and addresses. Sometimes the identifiable data has been replaced and it is possible to link the data back to an individual, whereas in other cases the data is fully anonymized. Tracing back data might be suitable if recommendations need to be made at the individual levels, e.g. your house can be isolated resulting in a saving of money, whereas, sometimes this might be risky or unwanted, e.g. energy usage might be used by companies to make a commercial offer. Another way of anonymizing data is by aggregating data in which the individual data cannot be accessible anymore. If this is suitable depends on the need of the data user, as the aggregated data cannot be used to infer at a lower level of aggregation.

Fig. 1.
figure 1

A taxonomy for forms of collecting and opening data

A third way of dealing with data is to make only the results of the processing available. The disadvantage of only having the outcomes of the processed data is that there is no direct access to the source which might be necessary for interpretation or for determining alternative ways of processing data. The advantage is that individual data does not need to be opened.

5.3 Driving Innovations: Balancing Data Push and Pull

Creating innovations with data is a complex process in which both the available data and the users’ demand need to be taken into account. The word ‘data-driven’ suggests that many efforts are data-driven rather than need-driven. In some cases we found that the actors started to think about how to create something that would be useful and attractive for the users, and thereafter the data was investigated. Also the reversed process was found in which data were first opened and thereafter data used to come up with new ideas. Often innovations were inspired by already existing examples which were transferred to the situations of the two smart cities. In the cases the actors examined an idea, and subsequently the potential and limitations of the data for the smart application were shaped. In Rio, for example, the traffic information was collected to improve bus routes which resulted in significant savings. In addition, the traffic information was used by the regional police system to improve the time of police work and to improve their security.

The innovations involved multiple actors, including experts with domain knowledge of the data, database specialists with knowledge about releasing data, analytics experts and people representing the needs of end-users. Having a multidisciplinary development team seems to be a key success, although the cases mainly included IT-people and did not focus on user involvement. Giving the huge amount of data and opportunities keeping the scope of the project and preventing scope creep is a key issue. Much can be done with the data and not all opportunities can be explored. The focus is on creating value by exploring those data that can be used. A trap is to develop many tools and websites without having the user in mind. Use of tools and interpretation by users demands that the gap between developers and users remains small.

Opening the data is not sufficient, and the use of the data needs to be encouraged and promoted [1]. Once an innovation has been explored the focus shifts to the creation of network externalities. Network effects or network externalities refer to the dependence of the value of a good or service on the number of other people who use it [32]. The bigger the volume of information, the more likely it is that users are attracted. A large volume of users might result in further innovation.

5.4 Does BOLD Result in Smartness?

We started this paper by looking how BOLD can be used to create smart cities. The cases of the two smart cities show clearly that BOLD can contribute to creating smartness in two ways (1) by linking and combining data sources resulting in new insights and (2) by employing data and predictive analytics. This does not rule out that there are other ways to create a smart city. For example, the availability of broadband is often considered as part of the smart city concept (e.g. [8]), but has nothing to do with BOLD. When we investigate the intelligence used to process the data, the algorithms used are relatively simple and straightforward. The challenge is often in making the data ready for use and combining them with other data.

The case studies showed that the development of services and applications for citizens requires the analysis of BOLD. The combination of data, data analytics and predictive analytics were used to create applications or to make sense of the data, especially when datasets from different organizations need to be combined to be able to draw useful conclusions. Predictive analytics can be used to seek to uncover patterns and capture relationships in data. Therefore, not only the data need to be available to their users, but also the tools to analyze the data need to be accessible and useful. Balancing the data-driven and user-driven innovation is a key aspect.

In both cases the role of the users is key to accomplish the improved use of resources. A smart city only becomes smart when there are smart citizens, businesses, civil servants and other stakeholders. In both cases participatory processes were used to involve the people in the cities. Connecting data and people is a key issue in which the people are able to make use of the data. In Amsterdam citizens can identify opportunities to safe energy and in Rio to improve their traffic planning. As such the concept of smart cities largely depends on what can be called smart citizens; citizens who are able to make advantage of the knowledge and to reduce the actual resource consumption, in this way accomplishing the objectives of smart cities.

6 Conclusions

The objective of this paper is to investigate the complementariness of the smart cities and big and open linked data research streams. Two case studies in smart energy and smart mobility are investigated in different countries. The main challenge of using BOLD for creating smart cities is the identification of data sources and making data available for use. For this purpose a taxonomy for forms of collecting and opening data was derived to support the collection of data. This taxonomy can be used by initiatives aimed at opening data to determine which way of opening data are appropriate. We recommend to further refine this taxonomy.

In both case studies a key element is that open data is linked to and mixed with closed data. This suggests that primarily focusing on open data is a too narrow view. In addition, in both cases data analytics are used to improve the resources in smart cities in which big and open data plays a pivotal role. BOLD enables the use of data and predictive analytics to improve the use of resources in the urban area. The data analytics used in both cases are rather simple. This shows that much can be accomplished using simple techniques. The linking and combination of data and the use of data analytics can result in improved decisions and better utilization of resources and in this way contribute the smartness of cities. Realizing the benefit is dependent on smart citizens; citizens who are able to make advantage of the knowledge and in this way better utilize resources. In the future more comprehensive and advanced data and predictive analytics might be employed to make even more better use of resources to make cities smarter.

Combining BOLD and smart cities proves to be a suitable combination and we plea for more research regarding the adoption, usage and impact of data for smart cities. A limitation of this conceptual study is that the findings are illustrated by only two case studies from different countries. We suggest to conduct more empirical research and to investigate the use of data analytics for creating smartness using a large sample.