Emerging Roles for Optimising Re-Use of Open Government Data

This paper describes a small-scale study to investigate the missions, services and operational tasks provided by four open government data centers: NYC OpenData (New York Open Data Center), DataSF (open data portal of San Francisco), WPRDC (Western Pennsylvania Regional Data Center) and the London Datastore (Greater London open data portal). The findings are used to propose three emerging specialist data roles for open government data (OGD) centers. The methodology used was an analysis of the textual content of the data center websites to identify the common elements of the mission and services. A common mission across all four open government data centers was ‘to improve the use of data’. The range of data center services and tasks identified and extracted from the websites could be classified into five common categories: Availability, Understandability, Technical Help, Social Engagement, and Improve User Data Literacy. Three new specialist open government data roles were proposed, which were framed to facilitate the delivery of the services identified in this study: Data Interpreter, Data Consultant and Data Visual Assistant. In parallel with existing research data policies and guidelines, these three specialist OGD roles could be extended and applied across other open data portals and domain-based data centers, including research data repositories, to optimise the delivery of open data, to facilitate greater value from data sharing, to maximize the understanding of complex data and to minimize the subsequent misuse of data. Received 22 January 2018 ~ Accepted 22 January 2018 Correspondence should be addressed to Liz Lyon, University of Pittsburgh, School of Computing and Information, 135 N Bellefield Ave, Pittsburgh PA 15260. Email: elyon@pitt.edu An earlier version of this paper was presented at the 13 th International Digital Curation Conference. The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution Licence, version 4.0. For details please see https://creativecommons.org/licenses/by/4.0/ International Journal of Digital Curation 2018, Vol. 13, Iss. 1, 362–372 362 http://dx.doi.org/10.2218/ijdc.v13i1.609 DOI: 10.2218/ijdc.v13i1.609 doi:10.2218/ijdc.v13i1.609 Xiao, Lyon, Zou and Gradeck | 363


Introduction
Acknowledging the value of Open Government Data (OGD), open data centers have been rapidly proliferating in the United States, Europe, and Asia.These centers publish increasing volumes of datasets which have been collected or used by governments, such as transportation and environmental data.With the growth of OGD, the functions of the associated infrastructure platforms are not limited to simply supporting data accessibility.The broader use of these data has become the main goal of the centers; indeed the full value of the centers cannot be realized until these datasets are widely used.Manyika et al. (2013) suggest that by applying advanced data analytics, citizen use of open data could produce $3.2 to $5.4 trillion in economic value per year across several domains.Therefore, in order to empower the use of open government data, these data should not only be available in consistent and easily usable formats, but also be understandable.Given this challenging goal of many OGD projects and the current types of open data (mainly complex quantitative data), open data centers have created several positions so that data portals function efficiently.These roles may be divided into general roles and specialist roles.General roles include data center manager, programmer, data analyst, and training specialist.However, new specialist job types are also appearing, which are the primary focus of this study.
In this paper, we have explored four local level OGD centers: NYC OpenData 1 (New York Open Data Center), DataSF 2 (open data portal of San Francisco), WPRDC 3 (Western Pennsylvania Regional Data Center) and the London Datastore 4 (Greater London open data portal).Local level portals were selected for this study because these platforms are likely to be more connected with civic organizations, neighborhoods, and communities.The field of information and data science has a growing interest in this domain, since the diverse challenges of curating and managing these data, facilitating access and reuse of data through dedicated user tools and services, plus the need to train people to improve their information or data literacy skills, are critical current themes for iSchool research and education programs.Three research questions are addressed here:

Literature Review
There has been much prior discussion of the requirements to develop workforce capacity and capability for data science and data stewardship (Lazer et al., 2009;Pryor and Donnelly, 2009;Bakhshi, Mateos-Garcia and Whitby, 2014;National Research Council, 2015).This literature has also explored the nature and functions of a range of supporting roles and positions, using a varied taxonomy to categorize the different job types.Six broad data scientist roles were described by Lyon and Brenner (2015) -data analyst, data archivist, data engineer, data journalist, data librarian, data steward/curator -and their likely organizational locations plus a brief summary of their key tasks were proposed.These data science roles were explored in more depth in two further studies (Lyon, Mattern, Acker and Langmead, 2015;Lyon and Mattern, 2016).These two reports describe an analysis of the real-world requirements for a range of positions across different job sectors and highlight the specific qualifications, knowledge, experience, skills and competencies for each role.
Despite the substantive research on broader data science roles, there appears to be a lack of research on these roles within the open government data context.Since the development of Open Government Data initiatives, and in particular the development of OGD portals, which have proliferated since the mid-2000s both at federal and local government levels, governments are actively seeking ways to make their data more easily accessible, usable and re-usable by all (Ubaldi, 2013).One complex challenge of open data is understandability; sometimes, data users find that it is difficult to interpret the data.The data in open data platforms are most often available in raw data formats (Weerakkody et al., 2017); also the users are unfamiliar with definitions or categories that are adopted to present the data.Another challenge is that users are required to have a certain level of skills to use the data (Kapoor, Weerakkody and Sivarajah, 2015).In general, user studies have found that the potential open data users lack the professional knowledge or skills to interpret or use the data (Martin, 2014;Janssen et al., 2012).
This current study seeks to begin to remedy the lack of research around OGD roles and to contribute to the field by providing a small-scale analysis of selected OGD portals, their associated user services and requirements for supporting data roles.

Methodology
This study focuses on four local-level open data portals: NYC OpenData (New York Open Data Center), DataSF (open data portal of San Francisco), WPRDC (Western Pennsylvania Regional Data Center) and London Datastore (Greater London open data portal).These four particular open data platforms were chosen by considering the following perspectives.First, city size, scale and geographical distribution: New York City (NYC), the City of San Francisco (SF), Pittsburgh and London are substantial metropolitan urban areas.NYC and SF are located in the east and west of the United States respectively, Pittsburgh is located in a more central US location and London is an international city in the United Kingdom.Taken together, these four cities represent a broad geographical spectrum, whilst all being cities of significant size with substantive local citizen populations.As a result, the OGD centers within these cities are able to collect and provide access to large amounts of data through their infrastructure platforms and services.A second perspective is the maturity of these OGD platforms.For example, DataSF was launched in 2009, the original London Datastore was doi:10.2218/ijdc.v13i1.609Xiao, Lyon, Zou andGradeck | 365 launched in 2010 (Arthur, 2010) and NYC OpenData was set up in 2012.As a result, these centers have been exploring and developing the methods and services which that can facilitate data reuse for many years.The Pittsburgh-based WPRDC was established in 2015, and whilst it is the most recently-established OGD center, it references many efficient operational methods, standards and data practices from those relatively mature data platforms.Therefore, WPRDC is a well-formed OGD center.Furthermore, from the perspective of familiarity, the first author worked for WPRDC for a year as a graduate student researcher, and the fourth author is the current director of WPRDC.As a result, we have an excellent understanding of the work of the WPRDC and of open government data centers in general.
The methodology utilized a content analysis of the four selected open government data center Web sites.The content analysis collected, examined and analyzed three key classes of information, including the mission statements, the range of user services, and the associated supporting tasks provided by the four portals.In the first step, to collect data for the in-depth analysis, one coder manually examined and extracted the three classes of information from the four platforms' official websites on January 10th, 2018.For each website, the coder first located and identified the relevant information, and then classified this information into different categories according to the thematic similarity.The coding results were then verified by the second coder, to ensure that both coders were working consistently.All the collected data was stored in a MS Excel spreadsheet and then manually analyzed.The four websites examined are listed below in Table 1.To further illustrate the text extraction and analysis process, the coder extracted raw data about the missions of the selected OGD centers, the services they provided, and the tasks they perform to support those services.In this first step, the coder simply collected relevant data into the three classes, Table 2 shows an example of part of this data collection.In a second step, based on the nature of the data center services and tasks, the coder classified this information into one of five categories: Availability, Understandability, Technical Help, Social Engagement (Interactive) and Improving User Data Literacy.Table 3 explains the five categories.

Results and Data Analysis
The missions extracted from the official websites of the chosen OGD centers are shown in Table 4. Based on the missions, the four platforms provide corresponding common services for data users.The extracted services were selected based on at least three platforms offering similar functions.These functions were then mapped onto one of the five service categories as shown in Table 5.The data for supportive tasks was primarily extracted and analyzed from WPRDC and DataSF websites, because only these two data centers provided detailed information about their staff and their work on the official websites.The extracted tasks were classified into the same categories as the services, since the tasks that the OGD centers have performed are to directly support the services.The classification of the specific tasks is illustrated in Figure 1 below.

Discussion
Returning to the first of our research questions, in Question 1 we asked: What are the common missions of open government data centers?From the extracted content describing the missions, we can see that the four OGD platforms have a common mission that is 'to improve the use of data'.Whilst this common mission is expressed and articulated using subtly different semantic language, the ultimate goal is the same for each open government data center (Figure 2).Our second research question asked: What user services and supportive tasks are provided by open government data centers?In order to achieve their common mission, the OGD platforms have set out to provide a range of services which are not limited to simply publishing data.In addition to the fundamental work of ensuring access to data (i.e.availability), the platforms carry out many micro-practices for increasing the use of data.For example, to help to make the data understandable, platforms have begun to provide showcases and data analysis reports to help users learn about the data.Some data centers have produced user guides in addition to providing metadata about their datasets and a data dictionary.The WPRDC offers Data Guides that contain contextual IJDC | General Article doi:10.2218/ijdc.v13i1.609 Xiao, Lyon, Zou and Gradeck | 369 information about datasets.The Data Guides are primarily created for assisting users in making sense of the open data, and in particular about the complex quantitative datasets.Additionally, the four centers support users to visualize the datasets available through their platforms, by using a range of online tools.Although this function is still under development, it represents a trend which OGD platform developers are following.For improving user data literacy, some data centers provide Help Desk support, answer user questions and deal with many technical issues.All of these services contribute to improving the use of the data, as stated in the common mission.
The third and final research question asked: Which specialist job types are needed to deliver these OGD services?Our content analysis of the four OGD websites and the identification of the specific services provided and operational tasks, have led us to propose three specialist open government data roles or positions, which are described here in more detail.
Data Interpreter: The goal of this role is 'to make sense of data for users'.A data interpreter's specific activities consist of working with data providers to create data guides, collecting and creating data-related blogs and data stories, working independently or with programmers to make maps or other visualizations, and informing data-related policies.
The role starts with open government data.A data interpreter is responsible for interpreting data in various ways, such as providing contextual information.The data will then be more explainable and understandable.In addition, the interpreted information not only helps users understand data, but also lowers the concern of data providers regarding misinterpreted data.
Data Consultant: The goal of this role is 'to directly assist users to understand the data and teach them technical skills to accurately use the data'.This job requires that the consultant will hold help-desk hours each week to help users who have difficulties with the data, especially from the perspective of technology; organizes meetings to collect information from various groups of people, including the data category needs or the required tools, and then finds the technical solutions to meet their needs.
This role starts with OGD users.One of the goals of OGD platforms is to reach more citizens and thus to increase the use of open data.Most of the OGD is raw data, and using the raw data requires a certain level of data processing skills.However, the data literacy levels of OGD staff are often very different from those of the public.Data literacy levels may also vary between different members of the public.Hence, data consultants can directly help users to understand and use data based on the specific user questions, and then ultimately improve public data literacy.
Data Visual Assistant: The goal of this role is 'to assist users to visualize or manipulate data by developing tools that can be directly and easily used by users'.This role's focus is to develop software tools and apps that can (easily) create data visualizations, such as line charts, bar charts, maps and other infographics.Users can then use the graphical tools and apps to create the specific visualizations they want by simply selecting the parameters and applying them to the whole dataset or data sub-set.
This role starts with tools.A data visual assistant makes the open government data more visible, more discoverable, more understandable and helps civic users to get credit for their 'mash-ups'.
Furthermore, these three OGD roles do not operate in isolation; rather they work together as an effective OGD team; the data interpreter and the data visual assistant both help users to gain new insights through expert exemplar interpretation and the provision and application of customized user tools.The data consultant offers public users professional help, to assist them in acquiring an in-depth understanding of open IJDC | General Article doi:10.2218/ijdc.v13i1.609government data and its value, plus the opportunity to enhance and build their own individual data skills.Figure 3 summarizes the connections between the OGD mission, user services, supporting tasks and the new specialist OGD roles which, it is hoped, will greatly contribute to solving the OGD challenges previously identified in the literature.Looking beyond open government data, although scientists agree with the potential benefit of data sharing/reuse for scientific progress, the majority are reserved when it comes to practical implementation.Researchers who are reluctant to share data with others, reported major concerns with legal issues, misuse of data, and incompatible data types (Tenopir et al., 2011).In spite of many research data centers and publishing platforms (Scientific Data, F1000Research, DataOne, etc.) offering data policies and guidelines in support of data sharing/reuse, there are still a large number of researchers responding that there is a risk that data may be misinterpreted due to the complexity of data (Tenopir et al., 2015).This risk will be reduced when data interpreters work with data providers to create contextual information for a particular data set.The OGD specialist data roles identified in this study, could also be applied to research data centers and repositories.In collaboration with existing research data policies and guidelines, data interpreter/data consultant roles in each domain could help to maximize the understanding of complex data and minimize the subsequent misuse of data.

Conclusions and Next Steps
This exploratory research has proposed three new and specialist OGD roles, and builds on prior work which has described generic data science roles.We acknowledge that this has been a small-scale study examining the websites of just four open government data centers, however the methodology used in this study could be extended and applied across other open data portals, to provide a more substantive baseline reference.Furthermore, the findings of this study may provide valuable indicators for open government data portal managers in developing strategy, planning operational services and in allocating resources for new positions to deliver on such plans.The high-level descriptions for the three new specialist roles, together with description of the microtasks which they may deliver, provide a good foundation for putative job descriptions for open government data centers to use in the future.
We plan to carry out a further study to investigate the concrete skills, competencies and knowledge that are required for the three specialist OGD roles proposed in this paper.We believe that the role requirements would not only contribute to OGD centers to help these organizations to effectively find suitable candidates and to develop their

Figure 1 .
Figure 1.Supporting tasks performed by the open data platforms.

Figure 2 .
Figure 2. The common mission of the four open data platforms.

Figure 3 .
Figure 3. Critical inter-relationships for optimizing the re-use of open government data.

Table 1 .
The OGD official websites.

Table 2 .
OGD website data collection exemplar.

Table 3 .
The five categories used to classify OGD website content.

Table 4 .
Missions of selected open data platforms.
http://www.wprdc.org/performance-management/ London Datastore We want everyone to be able access the data that the GLA and other public sector organizations hold, and to use that data however they see fit -for free https://data.london.gov.uk/about/

Table 5 .
Services provided by selected open data platforms.