Co-Creating Autonomy: Group Data Protection and Individual Self-determination within a Data Commons

Recent privacy scandals such as Cambridge Analytica and the Nightingale Project show that data sharing must be carefully managed and regulated to prevent data misuse. Data protection law, legal frameworks


Introduction
Rapid technological innovation in our data-driven society (Pentland, 2013) has changed how data subjects (those about whom personal data are collected), interact with data controllers (those who collect and determine what these data are used for).Privacy scandals such as Cambridge Analytica secretly harvesting 50 million Facebook profles to build models to influence elections 1 , widespread personal data sharing in the Google Nightingale Project 2 , and the intrusion of private life 3 and society 4 have made individuals more cautious about the information that they put online.However, data subjects are often left out of the conversation with regards to data protection (protecting data subjects' personal data in relation to the processing of data, where processing refers to any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, storage, or use).While certain laws and technologies attempt to encourage data subject participation and provide them with the ability to control their personal data, this is insuffcient as it relies on data subjects having a high-level of understanding of both the law and the resources available for individual redress.Moreover, such redress usually arises after data collection and sharing, after which time the damage may already have been done.
In this paper, we introduce a new framework, the data commons for data protection, in an attempt to improve data subject participation in the data protection process through collaboration and co-creation.The paper is outlined as follows.First, we explore the challenges facing data subjects who feel helpless as a result of the increasingly sizeable data controllers who collect, process, and share their personal data.We identify some of the solutions to this problem and also explore how these may be inadequate.We then introduce the commons as a potential framework for bridging the data protection divide, before detailing data curation as a use case for the data commons, examining what similar frameworks have been established in this space and how our data commons can aid better data protection for data subjects.Finally, we conclude that a co-created data commons can protect individual autonomy over their personal data through collective curation and rebalance power between data subjects and controllers, establishing the requirements of a data commons to help data subjects and explore how this could work in the context of data curation.

The Data Protection Divide
Under existing legislative frameworks and the available technological solutions, data protection focuses on putting responsibilities on data controllers and enforcement.However, this continues to place pressure on individual data subjects to protect their own personal data and seek individual remedies in case of breaches, rather than including them in discussions that can help shape data protection policies and better protect personal data.

The Law and Legal Frameworks
Laws such as the European General Data Protection Regulation (GDPR) and the California Consumer Privacy Act attempt to rebalance power between data subjects and data controllers.As European legislation is already in effect, we focus our work on the regulation and supporting information rights relating to the protection of personal data.The GDPR came into force on the 25th May 2018, introducing signifcant changes by acknowledging the rise in international processing of big datasets and increased surveillance both by states and private companies.Data subject rights offered by the GDPR include the right of access (Article 15), the right to be forgotten (Article 17), and the right not to be subject to a decision based solely on automated processing (Article 22).The GDPR has also clarifed the means for processing data, whereby if personal data are processed for scientifc research purposes, there are safeguards and derogations relating to processing for archiving purposes in the public interest, scientifc or historical research purposes or statistical purposes (Article 89), applying the principle of purpose limitation (Article 5).Information rights such as the right to access to recorded information held by public sector organisations through the Freedom of Information Act and intellectual property rights offered under the Directive on Copyright in the Digital Single Market can support data protection for data subjects in offering greater transparency and alternative remedies to issues relating to personal data.
However, even with legal protection, data subjects continue to be relatively powerless when exercising their rights against the increasingly sizeable and international data controllers (Edwards, 2019).Although people are more aware of their data subject rights, these are not well understood (Norris et al., 2017).Only 15% of EU citizens indicate that they feel completely in control of their personal data (Clusters et al., 2019).Evaluating location-based services, Herrmann et al. (2016) found that individuals do not necessarily know all the inferences that are made using their data and thus do not know how they are used.Importantly, individuals are unaware of, and unable to correct, false inferences, making the collection, transfer, and processing of their location data entirely opaque.With privacy policies written in legalese and privacy-protecting options hidden beneath "dark patterns", data subjects cannot easily fnd out how their data are reused, aggregated, and anonymised to make decisions about them (Utz et al., 2019).Additionally, laws focusing on placing data protection responsibilities on data controllers and empowering enforcement bodies assume that data controllers understand how to implement those responsibilities and that enforcement is successful.Data protection offcers' and authorities' enforcement practices are inconsistent and unclear due to lack of guidance (Norris et al., 2017).Data controllers responding to GDPR Article 20 right to data portability requests provided a large variation of fle formats that were not all GDPR compliant and confused the right with other data subject rights (Wong and Henderson, 2019).Kamarinou and others (2016) found inconsistencies in detail and lack of transparency about third-party storage and the processing of personal data of cloud service providers' in their terms and privacy policies.Funding for data protection authorities may also be limited, especially in comparison to the large multinational corporate data controllers.For example, the Irish Data Protection Commission was only given 27% of its requested increase by the Irish Government, totalling €21.1 million, despite increased responsibilities post-GDPR (The Irish Times, 2019).
Other legal frameworks have also been considered as a means to protect data subjects.Data trusts have been proposed as a legal framework for data stewardship and data management (Open Data Institute, 2019).A data trust is a legal structure that facilitates the storage and sharing of data through a repeatable framework of terms and mechanisms.Data trusts aim to overcome diffculties for data sharing and assure the credibility, trustworthiness and reliability of the resulting data analysis (Pinsent Masons, Queen Mary University, and BPE Solicitors, 2019).A data trust aims to respect the interests of those with legal rights in the data, ensure that the data is used ethically according to the rules established by the data trust, and collectively manage individual rights and interests.A key feature of data trusts is that they create rules to govern data sharing where a custodian or steward makes decisions on behalf of data users and subjects.However, this may still leave data subjects out of the data protection process.Without direct data subject engagement, decisions are made on their behalf by trustees as opposed to

IJDC | Conference Pre-print
with them, representing the data trust and not the data subject.Data trusts also maintain data protection enforcement issues as it relies on the trust to respond to such challenges, delegating data protection responsibilities (Open Data Institute, 2019).A data trust could in theory respond to certain data subject rights, but it would be diffcult to mandate rights to portability, access, and erasure that rely on the data controller (Delacroix and Lawrence, 2019).An alternative mechanism that takes this into consideration is data collaboratives that focus on harnessing privately held data towards the public good through collaboration between different sectors5 .However, individuals and groups of data subjects are still excluded from participation where they are only the potential benefciaries and are not part of designing the data collaborative framework.A report by Pinsent Masons, Queen Mary University, and BPE Solicitors (2019) suggests that both legal reform in data protection and technical considerations should be used to ensure that a suitable framework preserves the core rights of data subjects and balances the society benefts from data sharing against the interests of the data subjects.Although data protection law and other legal mechanisms provide data subjects with data subject rights and legal protection, data subjects are only seen as benefciaries of these frameworks and not active participants or contributors to the practices for protecting their personal data.

Technological Solutions
New technologies have also attempted to give users the ability to control their own data.Some tools include Databox6 (a personal data management platform that collates, curates, and mediates access to an individual's personal data by verifed and audited third party applications and services) and Solid7 (a decentralised peer-to-peer network of personal online data stores that allows allow users to have access control and storage location of their own data).Other applications attempt to facilitate data reuse with privacy-by-design built in, such as the Data Transfer Project8 (an open-source, service-to-service platform that facilitates direct portability of user data), OpenGDPR9 (an open-source common framework that has a machine-readable specifcation, allowing data management in a uniform, scalable, and secure manner), and Jumbo Privacy10 (an application that allows data subjects to backup and remove their data from platforms, and access that data locally).While these tools are useful if they offer controls that limit the processing of personal data according to data subject preferences, it results in the responsibilisation of data protection from data controllers to data subjects.Existing tools assume that data subjects have a high-level understanding of the data subject rights they have, framing privacy as control and placing individual onus on data protection.It also requires data subjects to trust the companies and the technological services they provide.Further, these solutions do not offer means for collaborative data protection where information gathered from individuals could be shared amongst each other.This disenfranchises data subjects from each other and prevents them from co-creating data protection solutions together through their shared experiences.

Finding a Co-Created, Collaborative Solution
While law and technology separately attempt to address some of these concerns, they may inadequately protect personal data because they rely on a high-level of understanding of both the law and the resources available for individual redress, usually after data collection.Focusing on individual protection assumes that data subjects have working knowledge of relevant data protection laws (Mahieu, Asghari, and van Eeten, 2017), access to technology, and that alternatives exist to the companies they wish to break away from (Ausloos and Dewitte, 2018).
Individuals are unaware of how their data are being used after it is collected and are disempowered from the data sharing process as they cannot identify the data controllers to exercise their rights against.Without a data breach, individuals do not know who else is affected, what data controllers and processors are using their data, or how to organise collective action to strengthen their argument for recourse.Even when notifed, data subjects rely exclusively on data protection authorities to fully enforce the law on data controllers.Data subjects lack a meaningful voice in creating solutions that involve protecting their own personal data.As a result, although legal and technological mechanisms are being implemented to address existing data protection issues in our data-driven society, the focus on individual protection, asymmetry of information, and the power imbalance between data subjects and data controllers make it diffcult for data subjects to engage with the data protection process.

The Commons for Protecting Data Subjects
Given the limited ability for data subjects to voice their concerns and participate in the data protection process, we posit that the protection of data from harms resulting from mass data collection, processing, and sharing could be improved by involving data subjects in collaboration and co-creation.
A framework that considers individual and group collective action, trust, and cooperation is the commons, developed by Elinor Ostrom in her seminal work 'Governing the Commons ' (1990).The commons itself guards a common-pool resource that may be over-exploited to depletion.A common-pool resource (CPR) refers to a natural or man-made resource system that is suffciently large as to make it costly (but not impossible) to exclude potential benefciaries from obtaining benefts from its use.Ostrom created a commons framework that depends on human decisions and activities, and management of the CPR according to the norms and rules of the community autonomously (Ostrom, 1990).The commons then represents a CPR for transparency, accountability, citizen participation, and management effectiveness, where 'each stakeholder has an equal interest' (Hess, 2006).A central part of governing the commons is recognising polycentricity in decision making, a complex form of governance with multiple centres of decision-making, each of which operates with some degree of autonomy (Ostrom et al., 1961).The commons framework respects the competitive relationships that may exist when managing a CPR.Its success relies on stakeholders entering into contractual and cooperative undertakings or have recourse to central mechanisms to resolve conflicts (Ostrom, 2010).The norms created by the commons are bottom-up, as illustrated by Ostrom's case studies of Nepalese irrigation systems, Indonesian fsheries, and Japanese mountains.The structure of these commons has enabled communities to fnd stable and effective ways to defne boundaries of a common-pool resource, defne the rules for its use, and effectively enforce those rules (Ostrom, 2012).
From these case studies, Ostrom identifes eight design principles that mark a common's success (Ostrom, 1990) Ostrom's design principles are important in the process of the common's lifecycle, where the limitations of the commons and regulation of CPR can iterate within changes to stakeholders as part of the collective governance process.

Co-creation and Collective Action within a Data Commons
Using the theory and principles of the commons, we suggest that a commons for data protection, a "data commons", can be created to allow individuals and groups of data subjects as stakeholders to collectively curate, inform, and protect each other through data sharing and the collective exercise of data protection rights.
In a data commons, the common property is personal data from data subjects that are used for a specifc purpose, and the framework incorporates existing legal and technological structures as well as data subject input and preferences.As personal data are aggregated and used to generate economic value (Singh and Vipra, 2019), data protection should move the focus away from individuals and towards groups, 'from processes of consumption to those of citizenship and accountability' (Taylor, 2017).Diaconescu and Pitt (2017) identify the need to build 'pro-social socio-technical systems' to better balance transparency and privacy, where identifed pathologies stem 'from regulatory choices and associated power struggles'.The data commons aims to help contextualise privacy beyond control and move towards privacy as ability and as a state, enabling a mechanistic expectation that addressing differences will make more people comfortable with the same technologies through relationships of respect (Shklovski, 2019).Our data-driven society has also become one that has privacy dependencies, where one person's privacy is implicated by information revealed by others (Barocas and Levy, 2019).The data commons builds upon existing group theories on the risks involved in public use of anonymised personal data (Floridi, 2017) and the necessity for collective rights (Raz, 1986) both before and after data are collected.A data commons encourages iterations of individual and group data protection objectives that can be different, personalised, and change over time (Making Sense, 2018).Figure 1 shows how data subjects are the focal point of the data commons, while other stakeholders are bound by the data subject's desire for better protection of their personal data.Within the framework, the data subject should also be able to interact with data controllers, data managers, researchers, and civil society for better data protection outcomes.A data commons developed using Ostrom's design principles is useful because of the IJDC | Conference Pre-print Wong and Henderson | 7 vast number of stakeholders that have a diverse set of opinions, problems, and preferences on how the data subjects' personal data are managed.
While some data commons have been established in context of data and research archives, they focus on increasing the distribution of data rather than on data protection.Local and international attempts have been made to further open science and open access initiatives through creating research data commons.For example, the Australian Research Data Commons11 (ARDC) is a government initiative that merges existing infrastructures to connect digital objects and increases the accessibility of research data.The National Cancer Institute (NCI) also has a Genomic Data Commons (GDC) that is used to accelerate research and discovery by sharing bio-medical data using cloud-based platforms.In Europe, the European Open Science Cloud12 (EOSC) is a Europe-wide digital infrastructure set up by the European Commission for research, with the aim to simplify the funding channels between projects.The EOSC was inspired by the F.A.I.R. principles, representing Findable, Accessible, Interoperable and Reusable data sharing and aims to become a 'global structure, where as a result of the right standardization, data repositories with relevant data can be used by scientists and other to beneft mankind' (EOSC European Commission, 2019).While these frameworks recognise that Figure 1.In a data commons (green), the data subject is at the centre.In this framework, the data subject and their personal data is the most important, and other stakeholders are only considered in context of the data subject's data protection.The different stakeholders represent the polycentricity of all the systems which have influence over data subjects.
the information and knowledge are collectively created, their implementations are hierarchical and top-down without input from archive participants or repository managers.Additionally, existing commons and data commons frameworks do not protect the personal data within them as they prioritise data sharing over data protection, particularly on data curation and reuse.The EU frameworks acknowledge the GDPR as the source for the right to data protection under the law, however it is currently unclear as to how it is implemented.Other applications of data commons include smart cities. Governments have used commons principles to take more responsibility over its citizen's personal data (Decode European Commission, 2018).These include the Bristol Approach city commons 13 , the Barcelona City Council Digital Plan 14 and Data Commons 15 , and the Commons Transition Plan for the City of Ghent urban commons 16 .However, smart city frameworks often rely on dynamic consent (Teare, 2019) and informed consent (Mikkelsen et al., 2019).As suggested in the previous discussion, the lack of knowledge and understanding of data protection by data subjects limits their ability to meaningfully consent to the collection, processing, and result of their personal data.Data subjects are also only able to make their decisions based on information that is provided to them and have no information as to the dataset they may be a part of.A data commons framework for data protection can move beyond those of existing research-and smart city-focused commons, applying Ostrom's theory so that data subjects can co-create the data protection responsibilities alongside data controllers, data managers, researchers, and civil society.
Building upon general principles of existing data commons such as the Data Biosphere (Denny et al., 2017) on a modular, community-driven, open, standards-based governance and the National Library of Medicine (NLM) on the necessity for security, searchability, standardisation of metadata, and the management of access control (Brennan, 2018), a data protection-focused data commons can serve as a technical solution within data protection legal structures.Unlike existing research data commons frameworks that focus on the dissemination of data and increased funding opportunities for research, a data commons for data protection focuses on data subjects to further their ability to protect the processing of their personal data.The framework can be used to balance the protection of the rights of data subjects with safeguarding the scientifc process and integrity of research results for researchers during the data curation process.A data commons is useful because mechanisms such as licensing for data archives may not be useful for data protection even if they limit forms of data reuse (Guadamuz, 2006).For a data archive data commons, the data subject can better maintain control over their data through the research process (Powell, 2015) and reduces the risks of personal data being misused, with severe repercussions to the data subject.This is especially important when curated data that used to be in the public domain no longer are, with wider ramifcations if the data are socially and politically sensitive, such as Twitter data on the 2014 Hong Kong Umbrella Movement (Tromble and Stockmann, 2017).While digital data archives aim to preserve, reuse, and promote ethically sound, methodologically well-grounded research, there continues to be insecurity by researchers about data sharing, where social media data sharing may become hidden and informal (Weller and Kinder-Kurlanda, 2017).A data protection-focused data commons applied to data archives of curated data could also help clarify who the data controllers are from a wide range of archive owners, dataset owners, or participants, identifying who is accountable to and for the publicised data and the resulting reuse outputs.Without a data commons, questions such as 'Who maintains control over curated data?', 'How can data controllers limit who and how collected data is reused?', and 'How can data subjects exercise 13 The Bristol Approach <https://www.bristolapproach.org/>,accessed 15 December 2019 14 Barcelona City Council Digital Plan <https://ajuntament.barcelona.cat/digital/sites/default/fles/LE_MesuradeGovern_EN_9en.pdf>,accessed 15 December 2019 15 Barcelona Data Commons <https://ajuntament.barcelona.cat/digital/en/blog/ethical-andresponsible-data-management-barcelona-data-commons>,accessed 15 December 2019 16 Commons Transition <https://commonstransition.org/commons-transition-plan-city-ghent/>, accessed 15 December 2019 IJDC | Conference Pre-print Wong and Henderson | 9 their data protection rights when sensitive and identifable personal data that could potentially be deanonymised is curated?' remain diffcult to answer.With a data commons, existing standards and review mechanisms such as research ethics board reviews, funding body requirements, and institutional policies can be integrated into the data commons, acting as the frst level of safeguard for data protection for data subjects in the future.Researchers that work in data curation and examine data protection practices on archive data reuse can offer privacy principles for individuals and organisations to adhere to.Although ethics approval may be granted by institutions, if the research data reuse by third parties is granted, the participant as the data subject may not know what stakeholders have access to their personal data and for what purpose.In cases where ethics approval is dubious, researchers may take their work outside of institutions and use data subjects' personal data in commercial ways, as in the case of Cambridge Analytica.This can be mitigated with a data commons where future researchers looking to use the data archive can utilise it to see what data limitations have been discussed and Figure 2. In a data commons (green), the data subject specifes to what extent they would like their data to be protected based on existing conflicts and challenges pre-identifed within the data commons for the use case (red).No prior knowledge of existing law, norms, or policies are required.Along with stakeholder information (blue), the data subject specifcation is then used to inform their data protection outcome that is generated from the system.As the outcome is data subject-centred, decisions ensuring the protection of the data subject's personal data may override existing preferences, policies, or standards set by other stakeholders.Data subjects can return to and review their outcome, add their data subject experiences to the data commons, and participate in the co-creation process at any time.

IJDC | Conference Pre-print
set by data subjects, proceeding with reuse if those requirements are met.Collective participation by different stakeholders further allows research participants to assess these forms of use and curate the data archive data for themselves, engaging with their own research interests while participating in the community interest.The data commons acts as a new means for data management where the reasons for use, limitations on reuse, and recourse after data are aggregated and anonymised are all contained within one ecosystem.
To implement a data commons, various legal and technological components need to be created for stakeholders to be engaged.Figure 2 shows how a data subject specifes to what extent they would like their data to be protected based on existing conflicts and challenges preidentifed within the data commons for the use case.In addition to data subject preferences, using information such as data controller policies, research papers, and input from civil society, a data subject specifcation is created and used to inform their data protection outcome that is generated from system.For example, the data subject-centred outcome could ensure that the data subjects' archive data are only to be reused by the researchers and data managers directly associated to the archive within the data commons and not by specifc external researchers.The archive researchers would automatically be notifed of the data subject's preferences.This allows the data subject to set their own limitations of how their data are used as opposed to it being decided by the archive itself.As the outcome is data subject-centred, decisions ensuring the protection of the data subject's personal data may override existing preferences, policies, or standards set by other stakeholders.Data subjects can return to and review their outcome, add their data subject experiences to the data commons, and participate in the co-creation process at any time.Data controllers can address new risks before collecting data, minimise the potential for data breaches, and meet stakeholder demands.Other stakeholders, such as researchers and civil society, can participate in the data commons to make the data sharing process more transparent, support exercising group rights, and provide information and standards for FAIRsharing (Sansone et al., 2019).These requirements aim to decrease the power imbalance between data subjects and controllers.Mapping out the development of a data commons into Ostrom's CPR design principles, a data commons will be clearly defned based on its use case, where each stakeholders' role is detailed.All data subjects that would like to fnd out more information about the use case, contribute, or co-create are free to participate in the data commons.Any bad practices, unethical behaviour, and data breaches will be identifed by the data commons system, with the remedies updated as stakeholders respond.Data subjects and other stakeholders can collaborate and establish their own norms, such as co-creating data sharing practices which promote data protection by design and facilitate data reuse for data research projects amongst a group of researchers.

Data Commons for Data Curation
We now outline data curation of public data archives as a use case for the data commons, assess how a data commons could address stakeholder issues by increasing accountability for data subjects' personal data, encourage collaborative curation, and allow for data protection to be an iterative process.
Using the Umbrella Movement as the use case for a data curation data commons, created at the beginning, the framework allows potential research participants as data subjects to see what and how their data will be shared with data controllers and researchers, raising and addressing any concerns respectively.Applying Ostrom's design principles, the data commons for data curation will have clearly defned boundaries as to the kinds of data and metadata it will archive and when such archival will cease.In this example, data subjects would like to publicise their experiences from the Umbrella Movement on a platform in the public domain.To participate, the data subject can identify the most applicable data commons by searching for keywords such as social media, data curation, research data, and data reuse.The identifed data commons would include information about data controller policies on personal data, research and IJDC | Conference Pre-print Wong and Henderson | 11 archiving, other data subjects' experiences and outcomes from exercising their data subject and information rights, recent news and scandals on data controllers, and expert and researcher fndings from their work based on relevant topics and tags.This allows the data subject to identify what settings there are for preferences such as limiting the audience, how information can be published in public and in private, whether data can be deleted, how published information could be used (by who, how, and the process), what intellectual property policies are for the published information, and how other data subjects felt about the platform's responses to information rights based on their experiences.
Without a data commons, the data subject would have to search for this information independently, looking for forums for information.After identifying the most relevant framework, Twitter for example, based on different stakeholder knowledge, the data commons uses the information and prompts the data subject to select a few preferences by answering questions based on the conflicts and challenges that have arisen from them.These questions could include 'Do you want your data or posts to be publicly archives?Available to researchers?', 'If your data or posts were deleted at a later date would you want to notify researchers of such request, for example to not include your data in future studies?', 'Do you want a mechanism to hide all or some of your data and posts?'.
Based on the data subject's responses to those questions, the data commons chooses platform settings and data actions for the data subject that best aligns to their aims.These may override platform policies based on data subject requests.For example, during the Umbrella Movement, Twitter decides to change its historical archive policy regarding the removal of deleted tweets.This would be automatically reflected in the data commons through technical means, notifying the stakeholders in the system.Although Twitter automatically removes deleted Tweets from its data archive, if a data subject would like for that information to be kept in certain pieces of work, researchers have a right to retain such data until further notice by the data subject.Requests by data subjects could be specifc such as any Tweets that include the term 'Umbrella Movement' can be kept while ones with 'universal suffrage' or 'Hong Kong independence' can be removed.
Researchers would be notifed of these preferences.The system can then be updated to match personal preferences with secondary resources to create a more comprehensive picture of what preferences data subjects would collectively like with regards to data curation, sharing, and reuse.Experts such as Tromble and Stockmann can advise data subjects on what they can do in light of new policy changes as well as data controllers on how to address any concerns raised.
Even without misuse and with GDPR Article 17 right to erasure, certain forms of data archival and curation can make removing personal data diffcult, particularly if such data has already been reused in research.If Twitter suffered from a data breach and released the personal data of data subjects during the Umbrella Movement, in a data commons, the breach can be addressed by supporting data subjects in exercising the right to erasure and sending notifcations to data controllers to request their data be removed.Researchers who have used the affected datasets and data would also be prompted of the breach and be required to issue corrections in their work and remove identifable data in relation to the data subject from their data archives should the right to erasure be exercised.Automatic detection of subsequent attacks caused by the data breach can also prompt the system to alert and support data subjects to exercise their data subjects rights if they haven't already as well as look for new alternative platforms that better support data protection practices.
In deciding the best platform and settings for the data subject's purpose of broadcasting the Umbrella Movement, further advice is also provided on how data from the data subject can be best protected.This includes: setting up an account with a disposable email, having an anonymous platform username, setting up tools that can automatically delete the data subject's posts, links to how to exercise information rights on the platform, and the successes and failures of other data subjects in this regard.This information is saved in the data commons and is accessible by the data subject at any time.Any information that the data subject has gathered can also be put into the data commons.

IJDC | Conference Pre-print
By applying Ostrom's principles to a data curation data commons, established methodologies and organisational structures are built into the data protection-focused framework while enabling individuals and groups whose data form part of the datasets to determine how their personal data are used.A data commons framework enables data protection because it operates as a polycentric system, working in tandem with data protection law and policy, data subjects and their rights, data controllers, data managers, and researchers to develop a better understanding of how personal data can be protected.The data commons simplifes the data protection rights procedure by including information, instructions, and templates on how rights should be collectively exercised, giving data subjects and opportunity to engage with and shape data protection practices that govern how their personal data is protected.

Conclusion and Future Work
To overcome the limitations of laws and technologies in protecting group data, we propose a cocreated data commons to maintain individual autonomy of personal data.Identifying requirements for the data commons based on Ostrom's framework on the commons, the data commons supports more accountable data protection practices, collaborative data management, and data sharing for the beneft of data subjects and data controllers.Applying the data commons to the use case of data curation, and specifcally to the collection and repurposing of Twitter data of the 2014 Hong Kong Umbrella Movement, we have shown how this framework could assist data subjects in limiting and preventing the misuse of identifable public, sensitive personal information.The data commons encourages the co-creation of data protection for data subjects while also allowing them to participate in shaping how other stakeholders manage their personal data.
Future work can use the data commons requirements established in this paper to build a prototype of a data commons for data archival and data curation.In order to assess whether a data commons is useful to data subjects, technical and non-technical requirements should be developed to identify what stakeholders should be involved and what data could be incorporated into the system.Surveys and interviews could be conducted to data subjects to identify the issues that the data commons could prioritise in helping them achieve the data protection they want.Experts can also provide input on what they believe are factors that support a successful data commons based on their knowledge of other commons frameworks.Further, the data commons should be built based on the principles of the GDPR such as data protection by design and by default (GDPR Article 25), where users can be anonymous in the data commons and any data included should be pseudonymised unless explicitly allowed to be identifable by the data subject.With the use of Ostrom's commons framework to develop a data commons, a prototype could be built to test the feasibility of the system in tackling stakeholder issues.
To conclude, in this paper, we established a framework for a co-created data commons that can rebalance power between data subjects and controllers.By including wider stakeholder participation such as researchers and civil society, the polycentric system places the data subject in the centre, supporting data subjects from the beginning of the data protection process, prior to any data being collected.Acknowledging the legal mechanisms and technological tools available to data subjects in protecting their personal data, the data commons not only incorporates existing data protection safeguards but also takes into consideration the needs of the data subject as the fundamental means to protect individual autonomy over their personal data through collective action and co-creation.