Accountability by design in technology research

What does research look like in practice? Aside from popular assumptions of how researchers are


Introduction
What does research look like in practice?Aside from popular assumptions of how researchers are lonely isolated individuals sitting disconnected from the rest of the world enmeshed in thought, a considerable part of research involves working with data.Whether this data is quantitative, qualitative, gath-for this endeavour.This idea of verification of science as a social technology that requires material and literary technologies exist, enabling a process of experimentation, documentation and verification of scientific knowledge.However even in the 17th century, "not everyone could come to London to see those experiments for themselves" ( Law, 2017 , p. 36).Thus the process of accurate documentation of the experiment became particularly important.We argue that in recent years the quality of documentation of technology research has been neglected, particularly in areas of high risk and high impact research.While other disciplines such as Medicine have invested considerable energy in ensuring their results are transparently documented and externally valid, there is still considerable work to be done to ensure the same level of rigour in technology research.This challenge is not just a challenge of documentation alone.Scientific endeavour strives to produce reliable and, in some fields, reproducible knowledge.Acknowledging that all knowledge is situated and that researchers over-emphasising the objective validity of their own research may be falling for their own "god trick" ( Haraway, 1988 , p. 584) does not take away from the need for rigorous documentation.If anything, the reality of science requires "engaged, accountable positioning" ( Haraway, 1988 , p. 590) in a manner that allows for "less false claims" ( Harding, 1992 , p. 587) rather than objectively true.To state that scientific claims are dependent on "all the procedures of the sciences (at best) can generate," ( Harding, 1992 , p. 587) is not a failure but appreciates the reality of what it means to conduct science.As a result "scientists and social scientists need to be accountable for what they write, rather than hiding behind the fiction that what they are reporting comes direct and unmediated from nature" ( Law, 2017 , p. 35).Seen from this perspective, better documentation of technology research can ensure not just a more accurate account of its process but contribute to safeguard the core of what constitutes good research itself.While the goal of "strong objectivity" ( Harding, 1992 ) may remain hard to achieve, ensuring greater documentation of the results of scientific research can help ensure meaningful accountability of scientific research.
At the same time, this approach builds on the EU General Data Protection Regulation (GDPR), specifically the accountability principle in Article 5(2) and Privacy by Design in Article 25 ( Regulation, 2016 ).Academic research deals with large amounts of research data, which frequently includes personal data.At the same time a lack of accountability in the use of research data can have grave consequences both for the data subjects who data is being stored or processed incorrectly, and for the societies who base their decision-making on the results of unaccountable academic research.Rather than relying just on data protection authorities, who are already struggling to keep up with technological developments ( Raab and Szekely, 2017 ), or data protection impact assessments ( Binns, 2017 ;Gellert, 2017 ) to ensure the accountability of academic research, we instead propose an accountability by design approach.When considering the accountability duties of controllers in Article 24 of the GDPR ( Regulation, 2016 ), a by design approach that minimises potential risks seems to be a particularly appropriate response ( Gellert, 2018 ).
The following article argues that rather than just providing output data to be considered in research -or providing explanations for technical outcomes as is frequently proposed in computer science ( Diakopoulos, 2015 ;Tufekci, 2015 ;Wagner, 2016 ), accountability can only be developed by better understanding the research process.In order to do this, we suggest a series of mechanisms that can be built into existing research practices to make them more intelligible to outside reviewers and scholars.These mechanisms are designed to develop the accountability principle of the GDPR and ensure more accountable scientific research.As the GDPR recitals also explicitly references "data […] processed for scientific research purposes," ( Regulation, 2016 , p. 30) an accountability by design approach to technology research is grounded both in the Articles and recitals of the GDPR.By documenting the key elements of a narrative research story which explains not just what you believe to have discovered but also how researchers got there, it may also be possible to create better accountability mechanisms.Many technical and organisational mechanisms that could enable more accountable research processes already exist; however, they are not typically used in this context and are thus not fit for purpose.To link these individual mechanisms together, we then suggest the blueprint for a combined organisational and governance framework for accountability by design in technology research.In conclusion, we discuss potential pitfalls of such an approach as well as future research areas that would need to be explored in order to implement this accountability by design for technology research in practice.
Within the context of this special issue, technology research is defined based on the work of the PanelFit EU H2020 research project.PanelFit researchers argue that "Large amounts of personal data are processed during research projects and several researches (mostly ICT researchers) produce algorithms that could be employed in several fields with significant effects on data subjects," ( MaLGieri, 2019 ) a challenge which integrates all disciplines from law to the social sciences, humanities and art.As such, we propose to define technology research as any forms of research related to information technology, whether this is related to its legal, technical, social political or other relevant aspects of research ( MaLGieri, 2020 ).
Throughout the article we also refer to 'high risk' technology research.Following the risk-based approach of the GDPR, we argue that high risk technologies need to be interpreted using the GDPR, as such high-risk technology would typically require a data protection impact assessment following Article 35 of the GDPR ( Regulation, 2016 , p. 53).As such, an accountability by design approach in technology research constitutes a step towards mitigating risks in conducting high risk technology research.

Existing models of accountability in technology research
Existing models of accountability in research always need to be seen in the specific social and historical context in which they developed.As discussed above the initial understanding of valid scientific research and those able to conduct and review it was limited entirely to aristocratic British men, as these were the only individuals considered able to think and judge the work of other independently ( Law, 2017 ;Shapin and Schaffer, 1985 ).
Accountability in the context of this article means creating an account of the research conducted by an individual researcher and being able to understand how certain decisions were made in the research process ( Bovens, 2010 ).The type of accountability suggested here is not financial or organisational but rather scientific, to ensure that the researcher holds up the highest standards of research towards the scientific community.As such, accountability is meant to reduce scientific malpractice and safeguard the highest scientific standards in areas where the impact of the results is likely to be great.It also attempts to make it easier to understand how and why scientific mistakes are made and as a result move the scientific field in which the research takes place further as a whole.Thus, the accountability by design approach proposed here can be seen as an 'accountability mechanism' ( Bovens, 2010 ), ensuring a greater level of accountability for individual researchers.

Accountability through peer-review and publication
In this context it is important to remember that peer reviewwhile historically always considered part of the practice of science for many hundreds of years ( Shapin and Schaffer, 1985 ;Spier, 2002 ) -it was not systematically institutionalised within everyday academic practice until the 1970s.Under pressure from academic funding authorities to justify their role in society, academic journals increasing began implementing peerreview procedures systematically, as part of a wider debate to justify public funding for science ( Baldwin, 2018 ).At the same time, peer review was always designed as a mechanism for the accountability of science while still ensuring scientific autonomy ( Baldwin, 2018 ;Spier, 2002 ).Notably academic movements towards accountability have been particularly high in areas where the public impact of these research findings is high.Thus, in areas such as medicine the practice of not just conducing peer review but 'Open Peer Review' has become increasingly common ( Haffar et al., 2019 ;Ross-Hellauer, 2017 ).This means both reviews and the identities of both authors and reviewers are fully transparent to all those involved in the process and also publicly available.In these contexts, it is also common to publish study protocols as open access documents before the studies themselves are conducted.1A shift towards greater transparency and accountability in areas of high risk and vulnerability are an important part of the wider academic research process.

Accountability through provision of datasets
Another aspect of accountability that is common in academic research is the provision.This challenge is closely related to the crisis in numerous scientific disciplines around the replicability of experimental findings ( Branney et al., 2019 ;Loken and Gelman, 2017 ;Shrout and Rodgers, 2018 ).This crisis has led to a wide debate on what constitutes acceptable science.For many disciplines particularly in the area of quantitative research, it has been common for some time to provide datasets to the general public as part of the article publication process.This process has further gained tractions under pressure from public academic funders such as the UK Economic and Social Research Council to provide open access not just to the text of the paper, but that relevant open access content includes "text, data, images and figures" ( Branney et al., 2019 , p. 485).This push for open access to data initiatives has also received considerable support from numerous scientists, providing frameworks for both quantitative data ( Arzberger et al., 2004 ) and qualitative data ( Branney et al., 2019 ) to be made more easily accessible to the public.

Accountability through ethics committees
Another key aspect of academic accountability is the role of ethics committees.Notably, these committees are far more common in areas where the impacts of the research are great and highly invasive on human or natural life, typically in areas such as medicine ( Frauenberger et al., 2016 ;Israel, 2018 ).However, they have also become increasingly common across academic institutions which wish to collect original research data ( Frauenberger et al., 2016 ).Notably there is considerable divergence between ethics committees in different parts of the world, with different standards, different institutions and different processes implemented in different universities and different countries ( Edwards et al., 2007 ;Goodyear-Smith et al., 2002 ;Redshaw et al., 1996 ).In some countries in the world ethics committees are a relatively uncommon phenomenon at universities, with researchers, institutes and faculties left to develop their own solutions to these challenges.As some of these countries that typically lack ethics committees are members of the OECD and EU, it would be false to see the existence of ethics committees as a phenomenon which is generally developed over time in a wide variety of academic institutions.Rather, it seems that different institutions in different cultural and academic contexts make different choices.Similar to open data and open access provision, it is the funders of academic research who have driven the creation of ethics committees ( Frauenberger et al., 2016 ).For example, it is necessary to have research vetted by an ethics committee, in order to be able to apply for academic research funding provided by the European Union.
In conclusion, it seems fair to suggest that accountability in academic research is deeply interwoven with the societal and economic processes it is embedded in.It cannot be seen disconnected from both societal contexts which require certain institutions or from research funding which has similar expectations.As a result, it can be argued that much of the institutional accountability developed for academic research is a result of outside pressures rather than inside innovation and development.Without extensive external pressure, it is unlikely that peer review, provision of datasets or ethics committees would be as common as they currently are.This extrinsic motivation for developing accountability mechanisms also needs to be considered in the context of the development of academic accountability measures going forward.

Developing this model further: accountability by design
In our review, accountability is about much more than simply having access to relevant source code or output data.Instead, accountability involved being able to meaningfully narrate how decisions were made in the development of a specific project.In this case the quality of the 'account' in accountability is particularly important: without a plausible and persistent narrative about the respective research project, it becomes very difficult to create any meaningful form of accountability, as the 'account' that the individual is being held to is not 'thick' enough.Without a robust account, the creation of accountability is not possible ( Bovens, 2010 ;Crabtree et al., 2018 ;Nissenbaum, 1996 ;Robles et al., 2018 ).Moreover, the development of such an accountability model for technology research can draw on other scientific disciplines such as medicine, where many of the accountability by design practices discussed here are already practiced more systematically. 2It can also draw on existing legislation, specifically the GDPR privacy by design duty (Article 25) and the accountability principle (Article 5(2)).In order to reduce the risks associated with a lack of accountability ( Gellert, 2018 ), an accountability by design approach rooted in academic and legal best practices for data governance seems highly appropriate.

An overview of accountability by design model
What could an existing model of accountable technology research look like?In essence, it would involve the prepublication of initial assumptions, theoretical framework, research question, hypotheses, methodology, planned data collection, data from actual data collection, analysis outputs and the final publication in a centralised repository.Some but not all of these procedures are similar to those that already exist in the area of medical clinical trials, broadly in line with the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) framework ( Chan et al., 2013a( Chan et al., , 2013b ; ;Lancet, 2010 ;Schulz and Grimes, 2013 ).
As leading medical journals such as BMC public health also publish open access study protocols before receiving the final results of the study, this provides an avenue for creating publication incentives for academics to provide access to the initial designs of their studies.The main challenge is that as these systems are journal-based, not all study protocols are always provided as open access to all interested parties, limiting those with access to journal subscribers.Moreover, the SPIRIT model is relevant for a very specific research model around 2 https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-019-6829-7 .clinical trials and may be too 'top heavy,' onerous or complex for similar applications in technology research.
Another important issue is that technology research has other forms of data not typically considered by SPIRIT protocols but are equally important to understanding the development of technology research.Source code, log files and the source code used both in the development of technological systems and in the area of quantitative academic research are not typically considered as part of the spirit framework.However much of this information is key to understanding the research process as it develops as well as the functioning and development of software.
As a result, in some areas of technology research, it is common to provide all source code for a research paper in a source code repository such as GitHub.Some authors also provide additional documentation or resources within such public repositories as well, including additional information on how to use the software, FAQs and additional information on how the software is maintained, by whom it is maintained and how to submit errors in the software to the developers ( Diakopoulos, 2017 ;Sokol et al., 2019 ;Stoyanovich, 2019 ).There are very few established written rules around this, it is simply considered good practice among much of the technology community ( Burton et al., 2017 ;Fuller et al., 2017 ).
A concept of accountability of accountability by design could combine the core principles of the SPIRIT model with the best aspects of what is considered good conduct in technology research.The model could look something like this ( Fig. 1 ): To provide an overview of some of the relevant questions that could be asked of each stage in the process: (  Of course, this list cannot be exhaustive, however it does present a set of interesting questions that provide a much richer and deeper overview of technology research than is typically the case.Having access to this sort of information might not be appropriate in all areas of research, however in areas of high-risk research, which is likely to have large impacts on society, this should definitely be considered.As technology research -like medical research -increasingly falls in this category, it would be wise to consider more accountable approaches in the area of technology research as well. This is particularly the case as a large number of research outputs in technology research are based on human computer interaction between the researcher, or teams of researchers and automated technical systems ( Wagner, 2016 ).Should these systems be engineered in a manner that is more likely to produce certain forms of research output, it would be important to know this and be able to study it systematically.Providing a rigorous mapping of the scientific journey of a research project including the story behind the research project and the paths not taken during the research project is key to understanding how scientific fields developing and ensuring they do in an accountable manner.

Ensuring a lightweight process through automation
In order to simplify the process and make it as lightweight as possible, it would be helpful to ensure that as much of the process as possible is automated.Thus, an additional layer of software plugins could be inserted to speed up and automate the process of storing key log data which is already generated by data analysis software.This intermediating data should provide an additional layer of simplicity and ease in generating the entire project in a systematic and straightforward way ( Fig. 2 ).While we will go into this process in a systematic way at a later stage, providing access to these logs in an automated fashion allows for all manner of automated research errors to be found most easily.For example, recent analysis of publicly accessible data sources found that many contained errors due to a formatting error in Excel that automatically converts certain types of data into another format.Being able to apply such automated techniques across a larger repository such as the one proposed here and inform researchers of possible mistakes before papers are published would certainly be helpful, as there are likely to be many more mistakes present that those just found in Excel ( Ziemann et al., 2016 ).Of course, such automated support could not be forced on researchers within the repository but would rather form part of 'standard practice' for ensuring that the automated techniques implemented have not produced erroneous results.

Governance, access and external accountability
One of the issues seldom discussed around research repositories are the issues of access, governance and privacy.These are important considerations as access repositories serve numerous distinct functions within a specific scientific context, which may require the provision of a variety of different roles and access levels to be created.For example, many universities operate and manage public repositories of their own academic research that are simultaneously used as proofs of work for academics to show that they are engaging in research towards their employer and to derive metrics of effectiveness for university administration across a variety of different departments ( Bryant et al., 2018 ).Not all of these roles may be conducive of ensuring greater accountability for academic research, nor will all of them be compatible with ensuring higher quality academic research or safeguarding academic freedom.They may also be questionable in the context of the GDPR, where both privacy and accountability are limited through the confusion of multiple roles of an individual user within one single system.By combining these roles in one mutually dependent system, it is difficult for the individual user to understand the purpose of the data being gathered, be able to ensure that the purposes of data gathering are limited, as well as opt-out of certain uses of their data.
The system proposed here should emphasise ownership of and control over repositories is extremely important, as shifts in ownership may impeded or enable different forms of accountability, depending on how repository ownership evolves over time.Thus many academics that used the academic social science research network (SSRN) paper repository have stopped using it since it what purchased by Elsevier ( Leeper, 2016 ).Similarly it is an open question whether Github can be considered an appropriate venue for source code sharing now that it is owned by Microsoft ( Silver, 2018 ).Reliable infrastructures for this purpose are likely to exist in the non-profit domain with solid public funding and outside the immediate control of funding authorities, academic journals or individual universities.
Most repository models tend to prefer fully open access systems in which everything is accessible to the general public.In the contexts of the systems discussed here, it might be wise to apply a differential privacy approach, where the main researchers decide which levels of access they wish to give to which actors.For example, high level data and results could be provided to the general public with more granular data provided to peer reviewers upon request.Additional data could be provided to the public if any claims are made that the research was developed in an incorrect manner or it was unclear how the research claims were developed.This would also ensure that stealing of ideas or research proposals and other unethical practices would become far less likely, as the information would be stored I the repository but would only slowly and systematically be provided to other communities and the general public at the discretion of the researcher.

Procedural and technical safeguards
Finally, the model of accountability proposed here centres around procedural safeguards.Steps would need to be taken to minimise the danger of researchers being coerced into providing data within the archive without their consent.This tension between the rights of employees and employers is common in understanding privacy at work."Every day, in every workplace, these competing interests meet, requiring a balance to be achieved between the employer's need for information and the employee's need for privacy" ( Keane, 2018 , p. 355).Moreover, providing a researcher-centric system which gives researchers agency over how their present information and their own research would diverge considerably from the currently institutional-driven system, in which researchers are incentivised to provide information in return for funding or employment.
Overcoming this tension is important, as it constitutes a key boundary towards meaningful accountability stems from uptake of the system.These procedural safeguards are design to ensure that researchers and their academic freedom and individual agency are protected, as well as the right to privacy under the GDPR.The model could also include technical safeguards such as end-to-end encryption of data stored the repository, to ensure that sharing of data is only possible with the explicit consent of the researchers involved.
At the same time, standard practices in individual disciplines would ensure that basic minimum standards are upheld.The greater the potential risk the public of incorrect or misleading research being published, the higher these minimum standards would need to be ( Wagner, 2014 ).Thus, certain journals could over time adopt basic minimum standards aligned with this accountability model.The same could be the case for certain faculties within academic institutions that wish to ensure accountable research.

Integrating existing accountability mechanisms
Finally, it is important to ensure that the platform being discussed here integrates existing the accountability mechanisms that already exist in the research process.Specifically, entering the data in the research platform could provide the researcher with the possibility of forwarding parts of the data they have stored in the platform to the relevant ethical review board or to apply for data protection clearances.These types of streamlines processes to not currently existing, leading to considerable procedural and organisational duplication which limits the ease by which existing accountability mechanisms can be used.

Existing systems that could fit to the proposed accountability model
The described model will increase accountability through a consistent human understandable narrative.That means that through the different stages of the creation of the product (often written in code) -which is in some cases a result of unstructured or un-linear "ad-hoc" programming -a human readable file of the creation process and its steps is written automatically as a side product ( Guardia and Sturdy, 2019 ).
This way of documentation of the writing process was invented in 1984 by Donald Knuth and is referred to as 'literate programming'."The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style.Such an author, […], chooses the names of variables carefully and explains what each variable means.He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other" ( Knuth, 1984 , p. 1).Through literate programming the code and the narrative are not seen as separate components, they rather form one concept.The product created with tools that enable literate programming are often referred to as 'dynamic documents.'Guardia and Sturdy explain the mechanical process of the creation of the dynamic document as follows: "the narrative component is written into the code as comments, then the code is run through the given statistical software (i.e.R, Stata, Python) and a log file that contains output and narrative is generated (with a specific Markdown format)" ( Guardia and Sturdy, 2019 , p. 39).The log file then can be converted into several other data formats that ensure readability like PDF or Word doc ( Samavi and Consens, 2014 ).
The dynamic document tools differ from statistical tool to statistical tool, especially because some are open source software like R and some are commercial software like Stata (where Markdoc can be used) or SPSS ( Haghish, 2016 ).One developed system for the use of R package or R Shiny is called "adapr". 3 This system is using cryptographic file hashes for the structure of the "accountable units" -parts of the code -and working with RMarkdown as a literate programming language.The aim and the characteristic of the accountable unit is summarised as follows: "We call these files accountable units because the system can provide their providence, meaning the data, code dependencies, date, and authorship" ( Gelfond et al., 2018 , p. 2).This way of identification and storage can provide additional information about the creation process, their context or position.
Another system developed for R is the 'cacher' package."The cacher package takes objects created by evaluating R expressions and stores them in key-value databases" ( Peng, 2008 , p. 1).This way of storage of the objects makes collaborative work more comfortable to share online.There are two main users for the system -one that stresses the statistical analysis as the author where the program mainly smoothens the distribution and documentation process -and the reader with a focus on readability and understanding of the shared document form the statistical analysis user.Other literate programming tools that works in a similar way are knitR4 ( Liu and Pounds, 2014 ) cacheSweave5 ( Peng 2004 ), Sweave ( Leisch, 2002 ), SASweave ( Lenth and Højsgaard, 2007 ) and ESS6 an addon package for GNU Emacs that is compatible with R, S-Plus, SAS, Stata and OpenBUGS/JAGS.Peng also referred to an odfWeavepackage7 of Kuhn and Weaston ( Max Kuhn, 2006 ) for an entirely open source workflow with ensuring the compatibility of R and OpenOffice.
RMarkdown is only one popular solution for the creation of dynamic documents the others one are Jupyter Notebooks ( Wes, 2012 ) or TensorFlow Distributions ( Dillon et al., 2017 ).The first has a deep link to the programming of code in R, while the other is associated with Python as it is understood to be program-agnostic ( Guardia and Sturdy, 2019 ).Python as one of the main programming languages used for AI and machine learning, has distinctive needs for tools that increase accountability and transparency.One open source package developed is 'FAT Forensics'8 developed by Sokol et al.This package is "designed as an interoperable framework for implementing, testing and deploying novel algorithms invented by the FAT research community and facilitate their evaluation and comparison against the state-of-the-art ones, therefore democratising access to these techniques.In addition to supporting research in this space, the toolbox is capable of analysing all artefacts of the machine learning processdata, models and predictions -by considering their fairness, accountability (robustness, security, safety and privacy) and transparency (interpretability and explainability)" ( Sokol et al., 2019 , p. 1).
To further ensure explainability as well as transparency and to tackle the black-box-problem of AI and machine learning the package has the functionality of generating implicit or explicit counterfactual explainers.Other commercial programs would be Microsoft/Interpret, 9 IBM AI Explainability 360 10 or Oracles/Skater. 11Additional open source applications would be AI Fairness 360, 12 Black Box Auditing 13 or Fairlearn 14 ( Sokol et al., 2019 ).
To further ensure accountability through the research process, we have to think about not only the program the technical research is created with, but also who is doing the research and how easily scholars and their work can be found and accessed.Initiatives who foster such an understanding of research are often referred to as 'open science' or 'open scholarship' where the latter term would be more preferable because it is not only indicating to a specific field or area of research ( Ayris et al., 2018 ).According to the OECD open science should improve efficiency in science through the decline of duplications in cost, the added value through reuse of data and making participation easier, increase transparency and quality, speeding up the transfer of knowledge, increase spillover-effects from the scientific world to the economic world, reacting more efficient to change and foster the engagement of citizens ( OECD, 2015 ).The benefits of open science within the research workflow are summarised by the League of Europe Research Universities.They mention the wider range of visibility of the research output enabled through the internet and fostering the so called 'citation advantage.'( Ayris et al., 2018 ) According to the OECD this advantage can be seen like a self-fulfilling prophesy where because of the ease of citation the publication is simply cited more often.The OECD further argues that another side effect of the citation advantage is the decrease of 'quality bias' because practitioners only publish the best quality of papers online and because of his excellence an increase of quality in research is enabled ( OECD, 2015 ).
The League of European Research Universities further points to the benefits of the greater insights into the research process when the used data and methodologies are available to the public, how the increased visibility of the product will enable the user to draw the lines of argumentation through the different steps in the research process, the definition of a minimum of transparency about the used research data through the access of the dataset and the acknowledgement that the researchers involved will have because their work is easily found through the use of identifiers used in the process ( Ayris et al., 2018 ).
These benefits will add value to the key actor of the research workflow -the researcher.The European Commission analyses the dynamics of motivation of this actor which is often led by the scientific reward system.The researcher is therefore information seeker and status seeker ( European Commission, 2019 ).This reward systems are created and enforced by the second group the European Commission identified: the universities and research centres.They get the motivated because "many institutions attempt to craft their incentives and assessment tools to secure better national and international rankings" ( European Commission, 2019 , p. 10).
Where the third identified actor in the process are the publishers.The publishers differ in their motivation from the other groups (commercial or non-profit) because they want to create a strong brand through their selection of research and therefore competing with another.The picture of key actors the OECD gives is more holistic it includes: "Researchers themselves, Government ministries, Research funding agencies, universities and public research institutes, Libraries, repositories and data centres, Private non-profit organisations and foundations, Private scientific publishers, Businesses, supranational entities" ( OECD, 2015 , pp. 12-13).
The benefit of more visibility can be achieved through a better access to publications.This benefit is referred to as 'open access' (OA).The term OA can be further divided into three categories: The golden, green or hybrid OA.The first is describing cases where the article is made publicly available through the journal publisher.The second is referring to cases where the publication is made publicly available in an indirect manner.For example, when the paper is after its release uploaded to another website without a paywall or any other restrictions to access.And the third is a way where publishers make use of a commercial way where the user has to pay for the content, when on the other hand some content is accessible for free ( Laakso and Björk, 2012 ).
Publication repositories can help to structure the search process of a researcher ( Peknikova, 2006 ).But also universities use the depositories to structure the research work of their staff like the EUDAT Collaborative Data Infrastructure15 or Zenodo16 an initiative of CERN and OpenAIRE to form a complimentary archive that gives the frame for the European perspective, but there are also universities that use similar models like WU FIDES ( Research Service Center, 2015 ), or Lund university Research Portal17 or CUED (Cambridge University Engineering Department) Publication database. 18The European Research Council also stresses the importance of the use of certification of data depositories like Core Trust Seal,19 ISO 1636320 or Nestor seal21 but gives also recommendations for general depositories for research data like Zenodo,22 Dryad, 23Open Science Framework24 or Harvard Dataverse25 ( European Research Council, 2019 , p. 6).
Through the use of open science and its tools the efficiency of the research process can benefit as well as the diffusion of the research itself ( OECD, 2015 ).To guarantee an environment for open science the data as well as the analysis "need to be publicly and persistently available via a stable, freely available archive" ( Leeper, 2014 , p. 1).The form that can be used according to Leeper is called a 'dataverse' where the title and the metadata mare the base line of information to the user.This network provides for storage "not just a free-standing data file, but associated files in almost any format and one might include files such as data, codebooks, analysis replication files, statistical packages, questionnaires, experimental materials, and so forth" ( Leeper, 2014 , p. 2).The data in the dataverse is converted to a generic tabular format and can be re-converted after the download to a format that can be used for common statistical programs.The data-corpus in the dataverse is easily identifiable though a digital object identifier (DOI) and the separate steps in the process of creation can be followed through version control.
To ensure that the desired academic author is found the League of European Research Universities promote ORCID ID which is "a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognised" ( Ayris et al., 2018 , p. 11).If a project is funded through a grant from the EU H2020 program the use of an data management plan (DMP) is obligatory in the Open Research Data pilot ( European Research Council, 2019 ).This plan has to be created within the timespan of six moth after the start of the project and should give information about the following information categories: "Data set description, Standards and metadata, Name and persistent identifier for the data sets, Curation and preservation methodology and Data sharing methodology" ( European Research Council, 2019 , p. 3).
The European Research Council also gives advice on how to find a safe depository for research data 26 by using the FAIR principles, store the data in a findable way, make it accessible, interoperable as well as re-usable.To support the researcher in the process of creating research reference managers are a common used tool.These tools can be summarised under the following definition: "Reference management software main-tains a database to references and creates bibliographies and the reference lists for the written works" ( Basak, 2014 ).Commonly used software is Zotero, EndNote, Procite, Mendeley or Refworks ( Dixon and Francese, 2012 ).The main users therefore are researchers, scientists, technologists and authors according to Basak.
Reference managers are an enabler for transparency which includes functions like "data archiving, qualitative databasing, hyper-links, traditional citation, and active citation" ( Moravcsik, 2014 , p. 3).The last helps to implement a baseline standard of citation that uses the technological capabilities.The citation includes a hyperlink to an "annotated excerpt from the original source, which appears in a 'transparency appendix' at the end of the paper, article, or book chapter" ( Moravcsik, 2014 , p. 3).This appendix Moravcsik describes contains four parts: the copy of the full citation, a summary of the source (50-100 words), an explanation why the source supports the claim and a link to the whole cited document.Through these multiple dimensions of documentation and logic path in argumentation why a source was used the author should be nudged by the program to make his line of though more transparent and make the whole research process easier to understand for the multiple stakeholders.

5.
Challenges with such an approach However, there are also challenges with an approach that focuses on creating greater accountability in technology research.As was noted above, different levels of disclosure may be necessary for different actors, which would favour a differential privacy ( Dwork, 2011( Dwork, , 2008 ) ) approach which includes both procedural and technical safeguards.While such an approach would diverge from the pure 'public access' approach, it is closely aligned with the accountability principle and privacy by design approach within the GDPR.It is also hoped that providing greater levels of researcher control over document publication will increase user acceptance of the proposed system.Even should this user acceptance be a given, some challenges still remain:

Degree of vagueness in documentation process?
It is possible that data is entered into the system that does not accurately reflect the actual research taking place.This is particularly likely to be the case when the information is used for other purposes than ensuring research accountability without the consent of the researcher involved.Researchers that have an incentive to present their research in a certain way are less likely to do so in an accurate manner than those who lack such incentives.As such it is important to ensure that the incentives of such a system are aligned in a way to produce maximum accountability for researcher output that they in turn feel comfortable (en)trusting the system with their ongoing research outputs.Similarly to the specificity of consent mentioned in GDPR recital 32, it is crucial to ensure that the data entered into the system is "specific, informed and unambiguous" ( Regulation, 2016 , p. 6).

Building an internal academic surveillance system
One of the main challenges around creating an internal log of relevant research activity is the danger that it becomes an internal academic surveillance system.Documenting too much information can lead to a situation in which individual researchers are limited in their ability to do research, or in which the system is misused against them.Thus, it is highly important that researchers themselves decide which data they wish to enter into that system and who they want to share that data with.Especially given the existing nature of challenging institutional relationships which govern current repositories, there is a possibility that academics will be coerced into providing personal and professional data within these systems as part of a workplace or other professional arrangement ( Keane, 2018 ).While this does not completely resolve all issues around the potential creation of a surveillance system, it may help to mitigate them.

Data can be (mis)-used for purposes it was not initially intended for
Another challenge related to the provision of open datasets online is the potential use and/or misuse of research data provided online.While providing open datasets online is an important contribution to research, it is less clear to what extent these datasets will be used in a positive way or in ways in which the researcher intended.In order to ensure that this is the case, there is a need to debate licensing issues around the provision of publicly accessible data.Simply to say that the research data is open for all possible uses is an insufficient ethical response and limits the opportunity for a rigorous academic debate on potential consequences and harms.
A similar debate has been going on for many decades as part of the open source and free software movement, to better understand in which contexts software can be used and whether meaningfully restrictions can be placed upon its use. 27There has also been an increasing debate about licenses such as the Qabel License or the First Do No Harm license which explicitly attempt to restrict certain types of problematic behaviour.28Similar debates are urgently needed around the provision of open datasets to ensure that researchers are aware and willingly engage in potential positive and negative consequences of publicly sharing datasets, while also using licensing mechanisms to ensure that potential negative consequences are mitigated to the best of their ability.

Limiting academic freedom
Another challenge in this context is that the existence of the system may be used as a mechanism to limit academic freedom.Researchers might be less willing or less able to conduct the types of research they believe is necessary if it needs to be documented more extensively than is currently the case.While giving researchers greater control of the mechanisms of disclosure may mitigate to some of these challenges it does not fully remove them.As a result, it may not be wise to implement this model across the board for all types of academic study.Rather it represents a specific model of research accountability that could by relevant in particularly high stakes cases, where the technology research being conducted is likely to be of a high societal impact.

Putting pressure on research methods that do not fit into this model
Another important concern of using a model like this is that many existing research methods may not fit well into this model.Certainly, the messiness of certain ethnographic or qualitative approaches to technology research in areas such as co-design or user-centred design may be difficult to integrate into this model.Another associated problem is that this model may promote specific types of research that are specifically designed to fit into this framework, thus gaming the model.This challenge is particularly pronounced if this model were to become mandatory in a significant number of areas, this pressure would definitely be a cause for concern.

Conclusions
A lot is at stake in technology research.We are reshaping entire societies across the world based on the results of technical research, whether it is building 'smart' cities or rolling out self-driving cars.It is only reasonable that the societies influenced by much of these high impact technologies would also want those conducting research on these topics to be accountable for the results they produce.This cannot be the case of all areas of science as this would be far to be too restrictive of academic freedom.But in some high-risk areas, greater accountability would go a long way to ensuring that accountable research is conducted systematically.Such high risk areas could include for example research on developing self-driving cars or smart cities.Many of the issues around where a research project went wrong are currently only discussed at science slams.Things that went wrong, roads not taken and many of the choices made are often not part of the final journal publication.While this may be considered acceptable for some areas of research, it poses a considerable challenge for high stakes technology research.In these areas, there is a need for research models which provide for a greater degree of accountability while ensuring that researchers are ability to fine tune these systems however, they see fit.
More broadly, it can be argued that all research should strive towards ensuring greater accountability ( Haraway, 1988 ;Harding, 1992 ) rather than claiming neutral objectivity.This approach is also consistent with the GDPR accountability principle and the privacy by design duty, ensuring a sound legal basis for developing more accountable academic research.Acknowledge that claims to producing an objective truth may be overblown does not mean taking 'anything goes' approach, if anything it means taking scientific accountability more seriously both in all areas of science ( Harding, 1992 ;Law, 2017 ).In order to achieve this in technology research, it is impor-tant to remember both the incredible impact technology research can have, but also the very real challenges of accountability of widely implemented technologies that can currently be observed in the world ( ACLU et al., 2018 ;Keller, 2019 ;Nissenbaum, 1996 ;Singh, 2018 ;Sokol et al., 2019 ).
Encouraging a process of accounting and accountability does not have to mean creating a system of surveillance that limits academic freedoms.Steps need to be taken to ensure that greater accountability does not limit scientific enquiry or prevent innovative research from taking place.However, it can be hoped that taking the best out of existing accountability methodologies from medicine and computer science may contribute to building a framework for greater accountability within technology research.The framework is based not just on the provision of code or data, but on a granular narrative of the research process and the decisions made on the way.Neither data nor code, but instead the research process itself is the story that needs to be told and captured.By ensuring that it is accounted for it may be possible to ensure greater accountability in the technology research process.

Fig. 2 -
Fig. 2 -Overview of proposed accountability model with software integration.