How to Achieve Compliance with GDPR Article 17 in a Hybrid Cloud Environment

On 25 May 2018, the General Data Protection Regulation (GDPR)Article 17, the Right to Erasure (‘Right to be Forgotten’) came into force making it vital for organisations to identify, locate and delete all Personally Identifiable Information (PII) where a valid request is received from a data subject to erase their PII and the contractual period has expired. This must be done without undue delay and the organisation must be able to demonstrate reasonable measures were taken. Failure to comply may incur significant fines, not to mention impact to reputation. Many organisations do not understand their data, and the complexity of a hybrid cloud infrastructure means they do not have the resources to undertake this task. The variety of available tools are quite often unsuitable as they involve restructuring so there is one centralised data repository. This research aims to demonstrate compliance with GDPR’s Article 17 Right to Erasure (‘Right to be Forgotten’) is achievable in a Hybrid cloud environment by following a list of recommendations. However, 100% retrieval, 100% of time will not be possible, but we show that small organisations running an ad-hoc Hybrid cloud environment can demonstrate that reasonable measures were taken to be Right to Erasure (‘Right to be Forgotten’) compliant.


Introduction
The new General Data Protection Regulation (GDPR), came into force on 25th May 2018 replacing the existing data protection framework. Ireland's Data Protection Commissioner, Helen Dixon, has publicly stated that GDPR improves the rights for data subjects by awarding them control over their Personally Identifiable Information (PII) [1]. This new regulation also imposes strict obligations for data controllers and data processors, who subsequently may incur significant fines of up to 20 million euro if they cannot demonstrate compliance. In recent years, many small organisations have become dependent on a hybrid cloud environment that they haphazardly implemented as a solution to meet their business needs. Based on the popularity and wide-spread adoption of these solutions, the hybrid cloud market is expected to increase [2]. No two hybrid clouds are alike, and few standards exist thus presenting even further challenges. Introduction of the new GDPR Article 17 legislation which awards individuals the right to request the removal of their data from third party systems and storage imposes a variety of burdensome tasks upon small organisations, requiring them to rethink and modify how they manage Personally Identifiable Information (PII). Many organisations are only processing and using a fraction of the data they store and therefore clearly do not understand their data [2]. This can be due to sprawling legacy systems, siloed databases, and sporadic automation. PII is a very valuable commodity for hackers, despite this many small organisations often mistakenly believe they have nothing worth stealing or that they are too small to gain a hacker's attention. Consequently, investing demonstrate reasonable measures were taken to become Right to Erasure ('Right to be Forgotten') compliant and demonstrate that it is able to identify, locate and report the location of PII for a specific data subject upon receiving a valid request and where the contractual date is due to expire.

General Data Protection Regulation (GDPR)
GDPR automatically became law in each member state without the need for local implementation aiming to modernise and harmonise data privacy laws between the Member States and introduced one legal framework to improve enforcement and reduce costs for organisations hopefully encouraging economic growth across Europe [5]. GDPR also aims to improve and expand the rights for data subjects giving them control over the collection and processing of their personal data [1]. GDPR contains 99 Articles, which are challenging for small organisations to understand and become compliant [5]. A brief overview of the key elements of GDPR are shown in Figure 1. GDPR is applicable globally to any organisation that collects, stores, processes or monitors a European residents' personal data, regardless of location, nationality and includes free goods and services [6]. It covers both computerised and hard copy fileable data. Under GDPR the data protection authorities will have the ability to impose sanctions with possible publicity and can impose significant fines of up to 20 million euro [1]. Compensation may also be payable to individuals whose rights have been breached. GDPR introduces four new roles which are: • Data Subject: A data subject is a natural living person who can be identified directly or indirectly [1]. A data subject is anybody residing in the EU, not just EU citizens. • Data Protection Officer: A Data Protection Officer (DPO) (Data Protection Commission, 2018), must have specialist skills and expertise to oversee GDPR compliance ensuring obligations are met from the highest level of management, are the point of contact for the supervisory authority and monitor the organisation's compliance with the law. The DPO can either be an employee or outsourced service. Under GDPR, whilst it is mandatory for all organisations to appoint (DPO), small organisations with less than 250 employees are exempt. • Data Processor: A data processor is an organisation that process data as per instructed by their data controller like cloud hosting providers [5]. GDPR recognising the complexity of modernday data processing relationships identifies that data processors play a vital part in the protection of European citizens data and so introduced direct rules for data processors such as record keeping and reporting data breaches. • Data Controller: The data controller is the organisation that collects, processes and stores PII and must be able to demonstrate GDPR compliance which means the burden of proof lies with them [7]. GDPR is applicable globally to any organisation that collects, stores, processes or monitors a European residents' personal data, regardless of location, nationality and includes free goods and services [6]. It covers both computerised and hard copy fileable data. Under GDPR the data protection authorities will have the ability to impose sanctions with possible publicity and can impose significant fines of up to 20 million euro [1]. Compensation may also be payable to individuals whose rights have been breached. GDPR introduces four new roles which are: • Data Subject: A data subject is a natural living person who can be identified directly or indirectly [1]. A data subject is anybody residing in the EU, not just EU citizens. • Data Protection Officer: A Data Protection Officer (DPO) (Data Protection Commission, 2018), must have specialist skills and expertise to oversee GDPR compliance ensuring obligations are met from the highest level of management, are the point of contact for the supervisory authority and monitor the organisation's compliance with the law. The DPO can either be an employee or outsourced service. Under GDPR, whilst it is mandatory for all organisations to appoint (DPO), small organisations with less than 250 employees are exempt. • Data Processor: A data processor is an organisation that process data as per instructed by their data controller like cloud hosting providers [5]. GDPR recognising the complexity of modern-day data processing relationships identifies that data processors play a vital part in the protection of European citizens data and so introduced direct rules for data processors such as record keeping and reporting data breaches. • Data Controller: The data controller is the organisation that collects, processes and stores PII and must be able to demonstrate GDPR compliance which means the burden of proof lies with them [7].
Under GDPR consent must be freely given, specific, informed and unambiguous [8]. All EU contracts must be valid and reflect the individual's new rights. PII is any information that can be used on its own, or combined with another bit that can be used to identify a living EU resident such as name, address, IP address, Personal Public Service Number (PPSN), account details, etc. PII is either Sensitive PII or Non-sensitive, where sensitive PII could cause harm to a data subject if breached, therefore must be encrypted both in transit and at rest [9]. whilst non-sensitive PII will not cause harm to a data subject therefore can be unencrypted. The key obligations imposed on organisations by GDPR are illustrated.
• Data Protection Impact Assessment (DPIA): DPIA aims to identify potential risks involved the collection, processing and storage of PII, the impact on the privacy of the data subject and identify ways to mitigate those issues [8]. • Transparency: An organisation must have a granular level of transparency into their PII from consent, collection, processing and storage for the full life cycle of that data and mandatory clauses (EU, 2016). • Data Minimization: PII can only be collected and processed where there is an identifiable reason why it is needed and should be kept no longer than is necessary for the purpose for which it was collected, and no additional data can be obtained [8]. • Security: Organisations must ensure that technological and organisational methods are in place to securely protect PII as per industry standard and best practices [7]. Implementing an IS0 27001 compliant ISMS would assist in achieving compliance [10].
A data controller must report a data breach to the data Protection Commissioner within 72 h and notify data subjects unless there is no risk of harm to [7]. GDPR has increased and strengthened the rights of a data subject [11].

Right to Erasure ('Right to Be Forgotten')
Article 17 of the EU General Data Protection Regulation (GDPR), the Right to Erasure ('Right to Be Forgotten'), was originally known as Right to be Forgotten (RTBF) but is now called the Right to Erasure [12]. This right proves to be the toughest data subject right to get operational and even the second most difficult GDPR obligation in practice overall [13]. The Right to Erasure ('Right to Be Forgotten') is a fundamental data subject right to ask from a controller that all their PII be erased and the controller must do so without undue delay and free of charge in accordance with GDPR Article 17 [8]. This right does not only apply to search engines, but to any organisation that collect, process or store PII. If you used to be an Eircom customer and you are not anymore, then you can ask them to get rid of it. That is the Right to Erasure ('Right to Be Forgotten'. The term "Right to be Forgotten" is a concept which originated from individuals need to "determine the development of their life in an autonomous way, without being perpetually or periodically stigmatised as a consequence of a specific action performed in the past." [14]. This concept has been practiced in the European Union (EU) and Argentina since 2006 [15] and there have been many discussions and debates over the years surrounding it with regards to its vagueness and concerns about its impact on the right to freedom of expression, its interaction with the right to privacy and whether creating a right to RTBF would decrease the quality of the internet through censorship and re-writing of history. Other concerns relate to problems such as revenge porn sites appearing in search engine listings for an individual's name or references to petty crimes committed many years prior still linked and displayed as part of an individual's footprint [16]. In 1995, the EU adopted the European Data Protection Directive, Directive 95/46/ec, to regulate the processing of personal data aiming to secure potentially harmful private information relating to an individual [16]. On 13 May 2014, in the Google Spain v AEPD and Mario Costeja González case, the European Court of Justice ruled that people have the right to be forgotten solidifying it as a human right. The irony of it all is that Mr Gonzalez intention was to obscure that information, but it resulted in becoming worldwide publicity (Ahmed, 2015). Courts worldwide have been referring to the European Court of Justice (2014) ruling on the right to be forgotten [17]. Then in 2016, under the introduction of the General Data Protection Regulation, this principle was modernised to bring it in alignment with digitalisation [18]. Grounds upon which a data subject can exercise the right to be forgotten are as follows [18].

•
The data is no longer required for the purpose that it was originally collected • The data subject withdraws consent • The data subject objects to the processing and there are no overriding legitimate grounds • The PII was processed unlawfully • The PII must be erased for legal obligations • Processing of children's PII collected via information society services Organisations must erase PII upon receipt of a valid request and this must be done within 30 days and free of charge [19]. If it is not carried out and without undue delay, then the data subject can report this to the Data Protection Commission. So, organisations now need to be concerned about their employees, customers and suppliers as well as authorities. Organisations must also erase PII once it expires. This is quite a complex task for most small organisations and they must understand what PII they retain, why they need this data, how long it can be retained, and they need to identify and locate PII throughout the entire hybrid cloud infrastructure including excel, work, PowerPoint, backups etc. Most do not have a clear understanding of where all the PII they retain is stored, including third parties. In fact, with the expanded definition of PII, they may not have a full understanding of all the data that should be classified as PII. Adding to this is the complexity of the data landscape within a hybrid cloud infrastructure. Additionally, they will not have the expertise or resources needed to undertake such a task. When using cloud services or third parties, both must understand what PII they have and why they have it and the liabilities involved. Retaining expired PII is a liability because if a breach occurs, compensation pay outs will not only apply to existing clients but also to clients an organisation no longer has. Organisations must have a system in place to easily identify, locate and report all PII for that one data subject and a system to identify, locate and report all PII that has expired, so it can be reviewed and deleted promptly. They must have this documented, so it can easily be followed and used to demonstrate that they have procedures in place to meet compliance. Automation will be an important part of this compliance to identify, locate and report all PII as having employees randomly looking through personal data would be a privacy issue. Many organisations tended to store extra data in case it may be useful later as storage was cheap, and it was easier than putting processes in place to check for obsolete data and removing same, now this must be erased and only the data relevant retained.
It may be impossible to truly enforce the right to be forgotten e.g., data is really outside the control of an organisation with the use of smart phones which enables an individual to take pictures of personal data, or an individual taking a screen print etc. and these could be distributed to various other locations by the click of a button using their private email, or removable devices. Another consideration is deleted files are not erased as they are still contained on the hard drive, even after emptying the recycle bin, thus enabling the recovery of PII [20]. It can be impossible to delete a single record for some PII without impacting on other PII e.g., microfiche, therefore it is not feasible to destroy this without losing other data that is still required by the organisation [21]. There are also built in features like Volume Service Shadow (VSS) whereby data can easily be recovered once deleted [9]. Deleted data can also be recovered in an SQL server database using Log Sequence Numbers (LSNs) or by using a third-party software like SQL Database Repair [22]. Data deleted is recoverable but if erased properly is permanent [23]. In some cases [24], it is possible to recover almost all deleted browsing activity. PII can be held on any device that has permanent memory like desktop, printer, laptop, external hard drives etc., so deciding whether to overwrite or destroy will depend on whether the organisation will use the device again [10]. With the introduction of GDPR, small organisations must monitor and manage their PII. Under GDPR, PII references any information that can be used to identify a specific living individual. Personal identifiers are displayed in the diagram above, however due to technology the scope has expanded to include IP address, login credentials, social media posts, geolocation, biometric, genetic and behavioural data. This expanded scope increases security and privacy challenges. Adding to this mixture is the challenges of direct and indirect personal data/information [25]. GDPR is applicable to automated PII, manual filing and pseudonymised PII [10]. Under GDPR personal data references special categories of personal data [8], which include genetic data and biometric data that uniquely identifies an individual. Exclusions are data relating to crime [26]. Personal data can be broadly categorised as structured, semi-structured and unstructured.
Structured data/information references data that is highly organised for example data stored in a relational database like SQL or stored in an excel spreadsheet. This type of data is easy to find, filter and search [27].
Semi-Structured-Data/information references data that cannot neatly fit inside a relational database, however it does have some structural properties allowing for analysis [28].
Unstructured data/information references data which is unorganised and does not have a pre-defined model. It cannot neatly fit inside a relational database and is incredibly difficult to identify, locate, manage and use like word. This data does not fit into relational databases and is the data that organisations struggle with when trying to meet Right to Erasure ('Right to be Forgotten') compliance, as it is impossible to scrutinise therefore must be metamorphosed into structured format, otherwise it is of no use to the organisation [27]. Unstructured content is typically text-heavy and multimedia which is estimated to represent more than 80% of the overall business information created and used. The volume of unstructured data held in various repositories within a hybrid environment increases continuously, resulting in the identification and location of same becoming more and more difficult to manage [26].

Cloud Computing
Cloud computing hosts and delivers various services over the Internet to store, manage and process data [29]. It has had a remarkable effect on Information Technology as cloud providers like Google, Amazon and Microsoft compete to make their cloud platforms the most powerful, cost effective and reliable. This in turn enables organisations to improve their business models, and they no longer must plan for provisioning as resources are allocated according to the level of demand. One important aspect of the cloud is that cost is normally in proportion to demand, which can be influenced by performance requirements. Resources must be allocated efficiently to ensure effective planning of costs and resources for both the client and the service provider [29]. Cloud service providers aim to offer methods to allocate or deallocate resources on demand to meet the service levels in the contract, or Service Level Agreement (SLA). Cloud computing has four deployment models [27]. A deployment model defines the purpose of the cloud and the nature of how the cloud is located. "The NIST Definition of Cloud Computing" classified cloud computing into four cloud types (public, private, community, and hybrid) also classified cloud computing into the three SPI service models-SaaS, IaaS, and PaaS [29]. In Infrastructure as a Service (IaaS) clients can provision virtual machines, virtual storage, virtual infrastructure, etc. The service provider is responsible for the management of all the infrastructure, whilst the client is responsible for all the other aspects of deployment including operating system, applications, user access. In Platform as a Service (PaaS), clients can provision virtual machines, operating systems, applications, services, deployment frameworks, transactions and control structures. Clients can also deploy their own applications on the cloud infrastructure or use applications and tools supported by the service provider. The service provider is responsible for the management of the cloud infrastructure, the operating systems, and the enabling software, whilst the client is responsible for installation and management of the application they deployed. Software as a Service (SaaS) is a complete operating environment with applications, management, and the user interface. An application is provided to the client through a thin client interface (a browser, usually). The service provider is responsible everything from the application down to the infrastructure, whilst the client is responsibility starts and ends with entering and managing its data and user interaction. SaaS is on demand software which is charged on pay per use basis.

Hybrid Clouds
A Hybrid cloud is a cloud computing infrastructure integrating multiple different cloud models (public, private or community), each retaining their unique characteristics, but are bound together as one unit. It offers standardised or proprietary access to data and applications and application portability. This concept also entitled as cloud bursting according to [26]. With this model an organization utilises their own computing infrastructure to handle their normal requirements, but any spike in requirements that occur will be handled by public cloud services. There are many issues like cloud inter-operability and standardization in hybrid cloud computing model. Critical activities can be performed within the private cloud and the non-critical activities performed within public cloud according to [29]. Advantages include scalability of an on-demand, externally provisioned cloud whilst also availing of increased security, privacy and auditability. It provides variety of options that can be utilised via public or private clouds whereby an organisation can select the most cost-effective delivery method for agile business requirements whilst staying within strict security and service level agreements. Disadvantages are that applications are spread across different environments adding complexity and the need to increase management and monitoring within the environment. It is best suited to organisations that need support for non-critical applications, great scalability, flexibility and optimal service levels together with the need for new agile environments requiring new services to be available immediately. The hybrid is ideal for an organisation utilizing private cloud that incurs peaks in demand requiring resource elasticity, but the cost of permanently having the benefit of resource elasticity far outweighs the access costs of on-demand. A Hybrid cloud enables the small organisation to take advantage of the benefits of public cloud where they can yet can keep PII in a Private cloud giving them the option of moving to the cloud gradually, if they so choose. A key issue with hybrid cloud is that many organisations have haphazardly moved into it rather than having chosen a hybrid strategy.
Many of the key benefits of Hybrid is that with Public cloud you get hardware, networking, storage, service and interfaces owned and operated by a third party for use by other organisations or individuals. Whilst there are a variety of public cloud service providers available Amazon AWS was selected for this test scenario as it provides Free Tier, is practices ISO 27k industry standards and PCI DSS best practices and is the most popular. Having the Private cloud, whilst like public cloud, in that you get hardware, networking storage, service and interfaces however it is owned and operated by the organisation. This private cloud will be secured to the on-premise environment. Many organisations chose hybrid as the total cost of ownership (TCO) with cloud solutions is far lower than ongoing costs of maintaining on-premise hardware. However, there are organisations already using public cloud yet are looking to private cloud solutions to reduce costs, particularly with high volumes of data as this requires more storage and network charges. So, it is about finding the right mix of public and private cloud solutions to gain cost savings. The hybrid cloud allows organisations to build redundancy into their IT architecture giving them extra security in the event of Disaster Recovery (DR). It also provides scalability if they need to scale up or down depending on spikes and troughs. Databases in the hybrid cloud are handy for new applications when you are not sure how successful they will be in the marketplace. Many organisations may want to sell the application fast and cheap therefore using public cloud resources for new untested applications before through the capital expenditures associate with launching in a private cloud. A hybrid cloud is beneficial for cloud bursting, so workloads can spill over to another cloud to meet capacity demands. Providing a high available geo-redundant setup using private cloud can be expensive to build. Many organisations cannot justify such expenses yet without one the organisation is vulnerable.
A Hybrid cloud infrastructure comprises of two or more different cloud infrastructures, private, community, and/or public, that remain exclusive entities, but are bound together by standardised or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds) [30]. The ISO/27k family of standards aim to help organizations, regardless of size, secure their information. These standards provide requirements for an information security management system (ISMS). The ISMS is management framework enables organisations to identify, analyse and address information risks. It ensures security arrangements are perfected to keep pace with the ever-changing security threats, vulnerabilities and business impacts which is crucial part in such a dynamic field. ISO27k's flexible risk-driven approach is advantageous compared to PCI-DSS.

Hybrid Cloud Test-Bed Design
We outline a test scenario for a small organisation that stores PII in a variety of data formats in various locations throughout a hybrid cloud environment. This test scenario is to examine if the identification, location and reporting of PII for specific conditions can be successfully carried out within a hybrid cloud environment, and what challenges it presents. This research aims to produce guidelines for a small organisation with a hybrid cloud environment, so they can become compliant with Article 17 of GDPR, right to be forgotten principle. We outline the setting up of a Hybrid cloud test scenario. ISO/IEC 27001:2015, ISO/IEC 27002:2015 and ISO/IEC 27018:2014 were consulted when setting up the test environment and the PII. ISO/IEC 17788:2104 and ISO/IEC 17789:2014 and ISO/IEC 19944:2017 were consulted when setting up the cloud components. Implementing a Hybrid cloud environment is really setting up three different environments and then ensuring they are integrated, all the while enforcing data privacy. PII for a data subject was created in various data formats in various locations. Once the environment was set up, scripts were created to securely identify, locate and retrieve the location of the relevant PII all the while adhering to data privacy. The ad hoc hybrid environment consists of a Local Area Network (LAN) containing the local office, a Wide Area Network (WAN) containing the on-premise private cloud infrastructure, and a Public Cloud infrastructure (see Figure 2). crucial part in such a dynamic field. ISO27k's flexible risk-driven approach is advantageous compared to PCI-DSS.

Hybrid Cloud Test-Bed Design
We outline a test scenario for a small organisation that stores PII in a variety of data formats in various locations throughout a hybrid cloud environment. This test scenario is to examine if the identification, location and reporting of PII for specific conditions can be successfully carried out within a hybrid cloud environment, and what challenges it presents. This research aims to produce guidelines for a small organisation with a hybrid cloud environment, so they can become compliant with Article 17 of GDPR, right to be forgotten principle.  Within the LAN, the local office has a physical server and a virtual server. The physical server holds an encrypted legacy access database storing personal/sensitive data. The Virtual Machine named VM1, contains the main production SQL database server named DB1 (which is the "SampleDatabase"). As the business progressed the original production access database was upgraded to the SQL Server database DB1. The management laptop was a Windows 10 64-bit with VMware workstation 14 pro. Virtual machines consist of a gold image virtual machine and a virtual machine containing an SQL database. The WAN encompasses the on-premise private cloud infrastructure and the public cloud infrastructure. The private cloud is accessed via a FortiClient IPsec VPN, whilst the public cloud is accessed via the internet. The private cloud environment consists of an ESXi server-windows containing two virtual machines (VMs) named VM3 and VM4. One of which will be used to store the Production SQL database whilst the other will be used to store critical documentation. VM3 contains an SQL database server named DB2, utilised as a backup of the production SQL database DB1. This caters for redundancy and provides high availability. VM4 contains a is a backup of VM2, catering for load balancing, redundancy and high availability. A shared folder will be used between the local office and the private cloud. The management laptop can access all VMs. All databases on the WAN retain critical personal and sensitive data. The Public Within the LAN, the local office has a physical server and a virtual server. The physical server holds an encrypted legacy access database storing personal/sensitive data. The Virtual Machine named VM1, contains the main production SQL database server named DB1 (which is the "SampleDatabase"). As the business progressed the original production access database was upgraded to the SQL Server database DB1. The management laptop was a Windows 10 64-bit with VMware workstation 14 pro. Virtual machines consist of a gold image virtual machine and a virtual machine containing an SQL database. The WAN encompasses the on-premise private cloud infrastructure and the public cloud infrastructure. The private cloud is accessed via a FortiClient IPsec VPN, whilst the public cloud is accessed via the internet. The private cloud environment consists of an ESXi server-windows containing two virtual machines (VMs) named VM3 and VM4. One of which will be used to store the Production SQL database whilst the other will be used to store critical documentation. VM3 contains an SQL database server named DB2, utilised as a backup of the production SQL database DB1. This caters for redundancy and provides high availability. VM4 contains a is a backup of VM2, catering for load balancing, redundancy and high availability. A shared folder will be used between the local office and the private cloud. The management laptop can access all VMs. All databases on the WAN retain critical personal and sensitive data. The Public Cloud infrastructure is accessed via a Virtual Private Cloud (VPC) and is utilised to store backups. It contains a further two VMs named VM5 and VM6. VM5 is an RDS instance used as a backup of the Production SQL database VM3. VM6 is an EC2 instance to hold a backup of the various types of archived documentation like reports, excel spreadsheets, word documents, PDFs and CSVs.

LAN, Private and Public Cloud Setup
This experiment will focus on performing data discovery on a collection of structured, semi-structured and unstructured data, to identify and locate all PII within this hybrid environment for (1) PII relating to a specific data subject and (2) PII relating to a specific date. On both occasions, a report will be produced containing the location of the PII which will be reviewed and if valid deleted without undue delay. An inventory of all the hosts detailing host name and IP address contained on the network was carried out and the PII was identified and classified. To carry out this data discovery, a Super User is setup with AAA. This user must have authentication (valid username and password), and where possible a two-factor authentication, authorisation to carry out the relevant activities, issue commands and to run PowerShell scripts and accounting to gain access and control to all the relevant devices, databases, folders, files and documents in all sections of the organisation, both internal and external. The Super User must have the correct enforcement policies set in place to be able to get to the data they need and have permission to use this data.
The hybrid cloud is set up to contain three components, a LAN, a Private Cloud and a Public Cloud. This small organisation will have one person allocated to the task of Right to Erasure ('Right to be Forgotten'). As a Hybrid cloud involves three separate environments, the user will be assigned administrative access to everything in each of the components. This user will be assigned a secure dedicated management laptop. It will be set up to enable connectivity with every device within the hybrid cloud infrastructure. The rest of this chapter details the setting up of each component of the Hybrid environment.

Local Area Network (LAN)
As illustrated in Figure 3, the LAN consists of a management laptop, a virtual machine and a further laptop acting as a server to store the legacy encrypted access database which is locked away in a secure cabinet.
Cloud infrastructure is accessed via a Virtual Private Cloud (VPC) and is utilised to store backups. It contains a further two VMs named VM5 and VM6. VM5 is an RDS instance used as a backup of the Production SQL database VM3. VM6 is an EC2 instance to hold a backup of the various types of archived documentation like reports, excel spreadsheets, word documents, PDFs and CSVs.

LAN, Private and Public Cloud Setup
This experiment will focus on performing data discovery on a collection of structured, semistructured and unstructured data, to identify and locate all PII within this hybrid environment for (1) PII relating to a specific data subject and (2) PII relating to a specific date. On both occasions, a report will be produced containing the location of the PII which will be reviewed and if valid deleted without undue delay. An inventory of all the hosts detailing host name and IP address contained on the network was carried out and the PII was identified and classified. To carry out this data discovery, a Super User is setup with AAA. This user must have authentication (valid username and password), and where possible a two-factor authentication, authorisation to carry out the relevant activities, issue commands and to run PowerShell scripts and accounting to gain access and control to all the relevant devices, databases, folders, files and documents in all sections of the organisation, both internal and external. The Super User must have the correct enforcement policies set in place to be able to get to the data they need and have permission to use this data.
The hybrid cloud is set up to contain three components, a LAN, a Private Cloud and a Public Cloud. This small organisation will have one person allocated to the task of Right to Erasure ('Right to be Forgotten'). As a Hybrid cloud involves three separate environments, the user will be assigned administrative access to everything in each of the components. This user will be assigned a secure dedicated management laptop. It will be set up to enable connectivity with every device within the hybrid cloud infrastructure. The rest of this chapter details the setting up of each component of the Hybrid environment.

Local Area Network (LAN)
As illustrated in Figure 3, the LAN consists of a management laptop, a virtual machine and a further laptop acting as a server to store the legacy encrypted access database which is locked away in a secure cabinet.   VMware Workstation 14 Pro for Windows 10 64-bit was installed onto the management laptop and the relevant licences applied to enable the creation of virtual machines. A trusted SSL certificate was downloaded via vSphere and FortiClient were installed on also to enable VPN connection with Private cloud. The relevant modules for AWS and SQL were imported into PowerShell. Two virtual machines were created in VMware workstation. A shared folder was created between the management laptop and VM1. An Access database was created and named SampleDatabase.accdb. This database was upgraded to the SQL database stored on VM1 within VMware Workstation. Figure 4 shows the Private Cloud environment consisting of two VMs. A physical server ESXi-6.0 in the LYIT Computing Data Centre (CDC) was configured with an up-to-date ESXi licence key, an IP address, default gateway, DNS Server and hostname of ESXi-15. A vSphere standard switch was created and a VLAN with ID 139 was added to the switch. To enable secure access with encryption, the VMware vSphere Client (with a trusted SSL Certificate) was downloaded and installed onto the management and FortiClient 5.6 for windows was downloaded, to install secure remote access, and IPsec VPN configured. VM1 will contain structured data. A further VM called VM2 was created and configured to act as a file server that stores documents. It was populated with a variety of semi-structured and unstructured PII in various stored in various locations within its file structure.  Figure 4 shows the Private Cloud environment consisting of two VMs. A physical server ESXi-6.0 in the LYIT Computing Data Centre (CDC) was configured with an up-to-date ESXi licence key, an IP address, default gateway, DNS Server and hostname of ESXi-15. A vSphere standard switch was created and a VLAN with ID 139 was added to the switch. To enable secure access with encryption, the VMware vSphere Client (with a trusted SSL Certificate) was downloaded and installed onto the management and FortiClient 5.6 for windows was downloaded, to install secure remote access, and IPsec VPN configured. VM1 will contain structured data. A further VM called VM2 was created and configured to act as a file server that stores documents. It was populated with a variety of semi-structured and unstructured PII in various stored in various locations within its file structure.

Public Cloud Environment
Amazon AWS was the selected cloud provider as it has a large customer base that has gone through rigorous security, is ISO 27k certified and provides Free Tier. Whilst it provides Free Tier there is no mechanism in place to stop you from going over the Free Tier limit. An EC2 instance was created to act as a file server which will hold a variety of documents types containing PII for a data subject and PII with an expiration date. An RDS instance was created as backup to the SQL production database. A Virtual Private Cloud (VPC) is created by default which can be configured to suit the organisation. A security group was set up for each class of instance. Adding and restricting ports can be a bit messy, but for this test scenario, ports 3306 must be set up for databases and port 80 for http. It is best practice to setup granular security groups instead of general ones.

Data Discovery
PII stored within a hybrid cloud infrastructure can be in various formats and locations. Therefore, a small organisation would first need to identify, locate and document all the hosts in their network. There are many free tools available to download that can be run to discover the hosts on the network e.g., Kali Linux. Then identify what data is retained that would be considered PII, and ensure this data is secured and protected. The first area for review should be the high-risk areas that involve financial data or third parties. A review of all the business processes must be carried out to ensure PII was captured with valid consent and has a valid need for processing as well as how long

Public Cloud Environment
Amazon AWS was the selected cloud provider as it has a large customer base that has gone through rigorous security, is ISO 27k certified and provides Free Tier. Whilst it provides Free Tier there is no mechanism in place to stop you from going over the Free Tier limit. An EC2 instance was created to act as a file server which will hold a variety of documents types containing PII for a data subject and PII with an expiration date. An RDS instance was created as backup to the SQL production database. A Virtual Private Cloud (VPC) is created by default which can be configured to suit the organisation. A security group was set up for each class of instance. Adding and restricting ports can be a bit messy, but for this test scenario, ports 3306 must be set up for databases and port 80 for http. It is best practice to setup granular security groups instead of general ones.

Data Discovery
PII stored within a hybrid cloud infrastructure can be in various formats and locations. Therefore, a small organisation would first need to identify, locate and document all the hosts in their network. There are many free tools available to download that can be run to discover the hosts on the network e.g., Kali Linux. Then identify what data is retained that would be considered PII, and ensure this data is secured and protected. The first area for review should be the high-risk areas that involve financial data or third parties. A review of all the business processes must be carried out to ensure PII was captured with valid consent and has a valid need for processing as well as how long the data is to be processed. As PII can be either direct or indirect, this will focus on the direct identification of the data subject as the research is about identifying and retrieving the location of PII for a data subject or a specific date. As this PII can be stored in a variety of data formats, it was decided to use different methods for identification as follows:

(a) Structured
From the database schemas the following fields listed in Table 1 were classified as direct PII. To search for PII for a data subject, a query was created to search the database for a data subject using "empLastName" and "empFirstName". The search variables can be changed to "empEmail" or "empContactNumber" or a combination of them all. To search for PII where the contractual date has expired, a query was created to search the database and report records where the "lastaccessdate" is equal to a date variable.

(b) Semi-structured
A PowerShell script was created to search through all the folders in each drive for documents of type .csv and .xml. To search for PII for a data subject this script will search the content of each document type .csv and .xml for variable "LastName" and "FirstName". Other variables can be added to be more specific like Date of Birth, PostCode. To search for PII where the contractual date has expired, the script will search through all folders in each drive for date created/modified is equal to a date variable.

(c) Unstructured
A PowerShell script was created to search through all the folders in each drive for documents of type .pdf, .docx, .txt and .xlxs. To search for PII for a data subject this script will search the content of each document type .pdf, .docx, .txt and .xlxs for variable "LastName" and "FirstName". Other variables can be added to be more specific like Date of Birth, PostCode. To search for PII where the contractual date has expired, the script will search through all folders in each drive for date created/modified is equal to a date variable.

Testing
We detail possible PII breaches an existing small organisation might have prior to implementation of our recommendations. We look at the measures taken to check the possibility of finding documentation containing PII for a data subject and PII for a specific date throughout a Hybrid Cloud Environment and how these PII breaches have been mitigated. The focus is on the direct identification of the data subject as the research is about retrieving the PII for a data subject or a specific date. We examine the challenges a small organisation using a Hybrid cloud may face in becoming "Right to Erasure" compliant to demonstrate that it is possible to be compliant, given a set of recommendations.

Example Scenario
Upon a valid request from a data subject, an authorised user would be assigned to create and run a query over the production database, to identify all records relating to the data subject. The results would be reviewed, and confirmation given to proceed with deletion of the selected records. Figure 5 highlights the potential PII breach of GDPR Article 17 "Right to Erasure". It illustrates that structured, semi-structured and unstructured PII is held on every device within each component of the Hybrid cloud, not just the production database. It also highlights that out of the multiple users within the Hybrid environment users 1, 2, 5 and 6 could be assigned this task. Upon a valid request from a data subject, an authorised user would be assigned to create and run a query over the production database, to identify all records relating to the data subject. The results would be reviewed, and confirmation given to proceed with deletion of the selected records. Figure 5 highlights the potential PII breach of GDPR Article 17 "Right to Erasure". It illustrates that structured, semi-structured and unstructured PII is held on every device within each component of the Hybrid cloud, not just the production database. It also highlights that out of the multiple users within the Hybrid environment users 1, 2, 5 and 6 could be assigned this task.  Figure 5. PII Location A Potential Breach of GDPR Article 17 "Right to Erasure".

Local
Many organisations would be in breach of Article 17 as they: 1. Do not have secure remote connectivity set up to access all devices on each component within the Hybrid cloud, therefore all PII cannot be identified or accessed on every device. In this example scenario, the assigned user only has access to the production database. 2. An SQL query would be run to identify and report the location of PII in the production database, without using encryption. 3. The output from the query would possibly be stored in a document/file that was not encrypted. 4. Do not have audit trails in place that can be used to demonstrate reasonable measures were taken to identify, locate, report and delete PII. This example highlights there is no guarantee which user will be assigned these tasks thus making auditing and event logging harder to trace. 5. Do not have authority to access all devices and the devices they have access to, they do not have authority to all PII so unable to identify, locate and report PII. 6. User does not have access to some of the passwords or cryptographic keys, therefore cannot access all PII. 7. Retains expired PII 8. Retains more PII than what was/is required for the purpose, thinking they might use it in the future. 9. Do not know their data landscape, nor what constitutes PII and as a result upon receipt of a valid request from a data subject to erase their PII, think deleting PII from the production database will suffice. In this example, the assigned user only checks the production database. 10. PII stored in other formats were not investigated.
11. An SQL query would have been run only over the Production database to locate PII, so other databases and data formats throughout the Hybrid cloud environment would have been overlooked. Many organisations would be in breach of Article 17 as they: 1.
Do not have secure remote connectivity set up to access all devices on each component within the Hybrid cloud, therefore all PII cannot be identified or accessed on every device. In this example scenario, the assigned user only has access to the production database.

2.
An SQL query would be run to identify and report the location of PII in the production database, without using encryption.

3.
The output from the query would possibly be stored in a document/file that was not encrypted.

4.
Do not have audit trails in place that can be used to demonstrate reasonable measures were taken to identify, locate, report and delete PII. This example highlights there is no guarantee which user will be assigned these tasks thus making auditing and event logging harder to trace.

5.
Do not have authority to access all devices and the devices they have access to, they do not have authority to all PII so unable to identify, locate and report PII. 6.
User does not have access to some of the passwords or cryptographic keys, therefore cannot access all PII. 7.
Retains more PII than what was/is required for the purpose, thinking they might use it in the future. 9.
Do not know their data landscape, nor what constitutes PII and as a result upon receipt of a valid request from a data subject to erase their PII, think deleting PII from the production database will suffice. In this example, the assigned user only checks the production database. 10. PII stored in other formats were not investigated.
11. An SQL query would have been run only over the Production database to locate PII, so other databases and data formats throughout the Hybrid cloud environment would have been overlooked. 12. Expired PII may be deactivated in some way, but unlikely to have been identified with a view to erasure. 13. No automation tool or script to identify and report the location of PII for a data subject or PII that has expired. 14. No access control lists or firewall rules configured to enable a user or device to access PII on every device. 15. The backup process of nightly, weekly and monthly would erase the PII, however there could be occasions where the monthly runs late, so PII would not be erased within the time limit of 30 day, which would be a PII breach. 16. Backups and archives may be stored off-site and on tapes. 17. PII would only have been deleted from the production database. 18. No processes or procedures in place to document.

Tests
Scripts using encrypted username and passwords were created on the management laptop to search for a data subject, to search for a specific date. These scripts will report the location for the specified criteria. Then using PowerShell ISE remoting, these scripts will be copied to each device and executed. An overview of the test structure is displayed in Figure 6. 12. Expired PII may be deactivated in some way, but unlikely to have been identified with a view to erasure. 13. No automation tool or script to identify and report the location of PII for a data subject or PII that has expired. 14. No access control lists or firewall rules configured to enable a user or device to access PII on every device. 15. The backup process of nightly, weekly and monthly would erase the PII, however there could be occasions where the monthly runs late, so PII would not be erased within the time limit of 30 day, which would be a PII breach. 16. Backups and archives may be stored off-site and on tapes. 17. PII would only have been deleted from the production database. 18. No processes or procedures in place to document.

Tests
Scripts using encrypted username and passwords were created on the management laptop to search for a data subject, to search for a specific date. These scripts will report the location for the specified criteria. Then using PowerShell ISE remoting, these scripts will be copied to each device and executed. An overview of the test structure is displayed in Figure 6. The PII will be found in various locations within the hybrid cloud environment and will be stored as a combination of (a) Structured: Access database and SQL database; (b) Semi-structured: .csv and .xml; (c) Unstructured: .docx, .pdf, .txt and .xlxs and (d) Encrypted: .zip, Legacy Access Database. A small sample of test data was set up for a data subject named Philomena Ann Kelly, and a date was entered for the contractual expiry date. PowerShell scripts were created to retrieve PII for a data subject and PII for a specific date. The aim is to investigate if the relevant data formats can be found on the various hosts within the hybrid cloud and interrogated to retrieve PII for a data subject (named Philomena Ann Kelly) and PII to expire. Each component of the Hybrid environment was configured to enable the execution of the PowerShell scripts to connect, identify and retrieve the location of both PII for a data subject and expired PII.

Structured PII Held in Databases within the Hybrid Cloud
The LAN "SampleDatabase" is a SQL database which is stored on a virtual machine VM1 within VMware Workstation 14 Pro. It was accessed via SSMS using windows credentials. PowerShell commands were run as Administrator on the management device. The script invoked a query to The PII will be found in various locations within the hybrid cloud environment and will be stored as a combination of (a) Structured: Access database and SQL database; (b) Semi-structured: .csv and .xml; (c) Unstructured: .docx, .pdf, .txt and .xlxs and (d) Encrypted: .zip, Legacy Access Database. A small sample of test data was set up for a data subject named Philomena Ann Kelly, and a date was entered for the contractual expiry date. PowerShell scripts were created to retrieve PII for a data subject and PII for a specific date. The aim is to investigate if the relevant data formats can be found on the various hosts within the hybrid cloud and interrogated to retrieve PII for a data subject (named Philomena Ann Kelly) and PII to expire. Each component of the Hybrid environment was configured to enable the execution of the PowerShell scripts to connect, identify and retrieve the location of both PII for a data subject and expired PII.

Structured PII Held in Databases within the Hybrid Cloud
The LAN "SampleDatabase" is a SQL database which is stored on a virtual machine VM1 within VMware Workstation 14 Pro. It was accessed via SSMS using windows credentials. PowerShell commands were run as Administrator on the management device. The script invoked a query to search the database for a data subject where "empLastName" and "empFirstName" is equal to the data subject name supplied. PII was successfully retrieved for a data subject named Philomena Ann Kelly as shown in Figure 7. The private cloud "SampleDatabase" is a SQL database which is stored on a virtual machine VM3 on the ESXi-15 server. PII was successfully retrieved for the data subject named Philomena Ann Kelly from the Private cloud database as show in Figure 8. The public cloud "SampleDatabaseBK" is an SQL database which is stored on Amazon AWS RDS-Public Cloud. A connection was made from the management device using SSMS on VM3 connect to the AWS RDS using the RDS endpoint and SQL server authentication. PII was retrieved successfully for data subject Philomena Ann Kelly from the AWS RDS database.
Sci 2020, 3, x FOR PEER REVIEW xiv of 27 data subject name supplied. PII was successfully retrieved for a data subject named Philomena Ann Kelly as shown in Figure 7. The private cloud "SampleDatabase" is a SQL database which is stored on a virtual machine VM3 on the ESXi-15 server. PII was successfully retrieved for the data subject named Philomena Ann Kelly from the Private cloud database as show in Figure 8. The public cloud "SampleDatabaseBK" is an SQL database which is stored on Amazon AWS RDS-Public Cloud. A connection was made from the management device using SSMS on VM3 connect to the AWS RDS using the RDS endpoint and SQL server authentication. PII was retrieved successfully for data subject Philomena Ann Kelly from the AWS RDS database.

Semi-Structured PII Held in Various Locations within the Hybrid Cloud
We ran a PowerShell script to browse through the LAN folders and sub-folders in each drive and retrieves the directory path for all documents of type .csv and .xml and writes the directory path to a file. It then read through each of these paths and searched the content of each document type .csv and .xml for variable that matches the data subject name. We retrieved PII for data subject named Philomena Ann Kelly stored in the documents shown in Figure 9. We ran a similar script on the private cloud and retrieved PII for data subject named Philomena Ann Kelly stored in the documents shown in Figure 10. The same was done on the public cloud and resulted in documents shown in  data subject name supplied. PII was successfully retrieved for a data subject named Philomena Ann Kelly as shown in Figure 7. The private cloud "SampleDatabase" is a SQL database which is stored on a virtual machine VM3 on the ESXi-15 server. PII was successfully retrieved for the data subject named Philomena Ann Kelly from the Private cloud database as show in Figure 8. The public cloud "SampleDatabaseBK" is an SQL database which is stored on Amazon AWS RDS-Public Cloud. A connection was made from the management device using SSMS on VM3 connect to the AWS RDS using the RDS endpoint and SQL server authentication. PII was retrieved successfully for data subject Philomena Ann Kelly from the AWS RDS database.

Semi-Structured PII Held in Various Locations within the Hybrid Cloud
We ran a PowerShell script to browse through the LAN folders and sub-folders in each drive and retrieves the directory path for all documents of type .csv and .xml and writes the directory path to a file. It then read through each of these paths and searched the content of each document type .csv and .xml for variable that matches the data subject name. We retrieved PII for data subject named Philomena Ann Kelly stored in the documents shown in Figure 9. We ran a similar script on the private cloud and retrieved PII for data subject named Philomena Ann Kelly stored in the documents shown in Figure 10. The same was done on the public cloud and resulted in documents shown in

Semi-Structured PII Held in Various Locations within the Hybrid Cloud
We ran a PowerShell script to browse through the LAN folders and sub-folders in each drive and retrieves the directory path for all documents of type .csv and .xml and writes the directory path to a file. It then read through each of these paths and searched the content of each document type .csv and .xml for variable that matches the data subject name. We retrieved PII for data subject named Philomena Ann Kelly stored in the documents shown in Figure 9. We ran a similar script on the private cloud and retrieved PII for data subject named Philomena Ann Kelly stored in the documents shown in Figure 10. The same was done on the public cloud and resulted in documents shown in Figure 11.

Unstructured PII Held in Various Locations within Hybrid Cloud
We ran a PowerShell script to read through all the folders and sub-folders in each drive and retrieves the directory path for all documents of types .pdf, .docx, .txt and .xlxs and writes the directory path to a file when found. It then reads through each of these paths and searches the content of each document type .pdf, .docx, .txt and .xlxs for PII that matches the data subject name. We retrieved PII for subject Philomena Ann Kelly stored in the documents shown in Figure 12. Figure 13 shows the retrieval of PII from the private cloud and Figure 14 shows the results from the public cloud search.

Unstructured PII Held in Various Locations within Hybrid Cloud
We ran a PowerShell script to read through all the folders and sub-folders in each drive and retrieves the directory path for all documents of types .pdf, .docx, .txt and .xlxs and writes the directory path to a file when found. It then reads through each of these paths and searches the content of each document type .pdf, .docx, .txt and .xlxs for PII that matches the data subject name. We retrieved PII for subject Philomena Ann Kelly stored in the documents shown in Figure 12. Figure 13 shows the retrieval of PII from the private cloud and Figure 14 shows the results from the public cloud search.

Unstructured PII Held in Various Locations within Hybrid Cloud
We ran a PowerShell script to read through all the folders and sub-folders in each drive and retrieves the directory path for all documents of types .pdf, .docx, .txt and .xlxs and writes the directory path to a file when found. It then reads through each of these paths and searches the content of each document type .pdf, .docx, .txt and .xlxs for PII that matches the data subject name. We retrieved PII for subject Philomena Ann Kelly stored in the documents shown in Figure 12. Figure 13 shows the retrieval of PII from the private cloud and Figure 14 shows the results from the public cloud search.

Unstructured PII Held in Various Locations within Hybrid Cloud
We ran a PowerShell script to read through all the folders and sub-folders in each drive and retrieves the directory path for all documents of types .pdf, .docx, .txt and .xlxs and writes the directory path to a file when found. It then reads through each of these paths and searches the content of each document type .pdf, .docx, .txt and .xlxs for PII that matches the data subject name. We retrieved PII for subject Philomena Ann Kelly stored in the documents shown in Figure 12. Figure 13 shows the retrieval of PII from the private cloud and Figure 14 shows the results from the public cloud search.

Post Implementation
By implementing the recommendations for "Right to Erasure", upon a valid request from a data subject, the super user would be assigned to securely identify, locate and retrieve the location of the relevant PII. The super user will manually log on to each device using the encrypted credentials, then execute the relevant script on each device to identify, locate and retrieve the location of the relevant PII. Figure 15 demonstrates that PowerShell can securely access every device within the Hybrid cloud environment. It can identify, locate and report the location of PII for both a data subject and expired PII on every device on each component throughout the Hybrid cloud environment.

Database Server
Privat e

Post Implementation
By implementing the recommendations for "Right to Erasure", upon a valid request from a data subject, the super user would be assigned to securely identify, locate and retrieve the location of the relevant PII. The super user will manually log on to each device using the encrypted credentials, then execute the relevant script on each device to identify, locate and retrieve the location of the relevant PII. Figure 15 demonstrates that PowerShell can securely access every device within the Hybrid cloud environment. It can identify, locate and report the location of PII for both a data subject and expired PII on every device on each component throughout the Hybrid cloud environment.

Post Implementation
By implementing the recommendations for "Right to Erasure", upon a valid request from a data subject, the super user would be assigned to securely identify, locate and retrieve the location of the relevant PII. The super user will manually log on to each device using the encrypted credentials, then execute the relevant script on each device to identify, locate and retrieve the location of the relevant PII. Figure 15 demonstrates that PowerShell can securely access every device within the Hybrid cloud environment. It can identify, locate and report the location of PII for both a data subject and expired PII on every device on each component throughout the Hybrid cloud environment.

Database Server
Privat e  This will eliminate the PII breaches as: 1. The dedicated management laptop has a file containing all the relevant credentials encrypted. These can only be decrypted by the super user (who encrypted them) and only on this device (where they were encrypted) enabling secure non-intrusive remote connectivity to every device on each component within the Hybrid cloud, enabling all PII to be identified and located.

2.
PowerShell can encrypt and decrypt so files containing PII will be encrypted in transit as well as at rest.

3.
Credentials are stored encrypted and only the super user can decrypt these when required.

4.
Files can be encrypted using an AES algorithm, so the data retrieved using the script may be encrypted. The use of specified trusted hosts adds to securing access.

5.
Only one super user will carry out this task therefore audit trails and event logs can be used to check the activity of this user and demonstrate reasonable measures were taken to identify, locate, report and delete PII. 6.
The super user has the appropriate authority to access all devices and access all PII on these devices so PII can be identified and the location reported for all PII within the Hybrid cloud environment.

7.
The super user will have access to all passwords, cryptographic keys, names and IP addresses of all devices within the Hybrid cloud enabling administrative access to all devices. 8.
Expired PII is no longer retained as PowerShell can identify and retrieve the location of all PII within the Hybrid environment and so can be deleted. 9.
Whilst identifying all the PII that is currently retained, extra PII may be identified and dealt with, but data minimisation would be carried out under GDPR Article 5 "Principles relating to processing of personal data". 10. The data landscape is clear so now all areas containing all formats of PII can be identified and located within the Hybrid cloud environment. 11. The super user has administrative authority to all PII on every device within the Hybrid environment and because of this, the process cannot be fully automated. A manual log on is required to every device and each script run separately. All PII can now be identified for a data subject and PII where the contract has expired. 12. With all devices configured, PowerShell can be run to locate and retrieve the location of PII for both a data subject and expired PII. 13. Access control lists and firewall rules are configured to only allow the dedicated management machine and super user access to PII on every device. 14. The backup process of nightly, weekly and monthly would erase the PII, therefore procedures put in place that the monthly is run on 28th day or nearest weekend to 28th day of the month. 15. Backups and archives may be stored off-site and on tapes. These must all be encrypted and stored in a secure environment and where feasible erased. 16. All PII would be securely deleted from every device within the Hybrid cloud. 17. Steps undertaken would be documented so they can be used to demonstrate that reasonable measures were taken to be compliant with GDPR Article 17, "Right to Erasure". Table 2 summarises the GDPR Article 17 "Right to Erasure" breaches prior to the implementation of the recommendation and how these have been eliminated post implementation, thus demonstrating reasonable measures have been taken to be compliant with GDPR Article 17 "Right to Erasure": Table 2. Summary of PII breach before implementation and how these have been eliminated.

Recommendations to be
Implemented Before Using Recommendations After Using Recommendations

1.
Allocate a dedicated management laptop configured with trusted hosts, file containing encrypted username and passwords, to connect securely with all devices.

1.
Do not have secure remote connectivity set up to access all devices on each component within the Hybrid cloud, therefore all PII cannot be identified or accessed on every device. In this example scenario, the assigned user only has access to the production database.
1. The dedicated management laptop has a file containing all the relevant credentials encrypted. These can only be decrypted by the super user (who encrypted them) and only on this device (where they were encrypted) enabling secure remote connectivity to every device on each component within the Hybrid cloud, enabling all PII to be identified and located.

2.
Ensure PII is encrypted both at rest and in transit. 2.
An SQL query would be run to identify and report the location of PII in the production database, without using encryption. 3.
The output from the query would possibly be stored in a document/file that was not encrypted.

2.
PowerShell can encrypt and decrypt so files containing PII will be encrypted in transit as well as at rest.

3.
Credentials are stored encrypted and only the super user can decrypt these when required.

4.
Files can be encrypted using an AES algorithm, so the data retrieved using the script may be encrypted. The use of specified trusted hosts adds to securing access.

4.
Do not have audit trails in place that can be used to demonstrate reasonable measures were taken to identify, locate, report and delete PII. This example highlights there is no guarantee which user will be assigned these tasks thus making auditing and event logging harder to trace.

5.
Only one super user will carry out this task therefore audit trails and event logs can be used to check the activity of this user and demonstrate reasonable measures were taken to identify, locate, report and delete PII.

4.
Create a super user account with administrative authority enabling full access to all devices (physical and virtual) within the hybrid environment.

5.
Do not have authority to access all devices and the devices they have access to, they do not have authority to all PII so unable to identify, locate and report PII.

6.
The super user has the appropriate authority to access all PII on every device so all PII formats can be identified and located within the Hybrid cloud environment.

5.
Super user must have access to passwords, cryptographic keys etc., names and IP addresses of all devices within the Hybrid Cloud.

6.
User won't have access to some of the passwords or cryptographic keys etc. therefore cannot access all PII.

7.
The super user will have access to all passwords, cryptographic keys, names and IP addresses of all devices within the Hybrid cloud enabling administrative access to all devices 6. Identify all the PII that is currently retained.
Retains more PII than what was/is required for the purpose, thinking they might use it in the future. 9.
Do not know their data landscape, nor what constitutes PII and as a result upon receipt of a valid request from a data subject to erase their PII, think deleting PII from the production database will suffice. In this example, the assigned user only checks the production database. 10. PII stored in other formats were not investigated.

8.
Expired PII is no longer retained as PowerShell can identify and retrieve the location of all PII within the Hybrid environment and so can be deleted.

9.
Whilst identifying all the PII that is currently retained, extra PII may be identified and dealt with, but data minimisation would be carried out under GDPR Article 5 "Principles relating to processing of personal data". 10. The data landscape is clear so now all areas containing all formats of PII can be identified and located within the Hybrid cloud environment.

7.
Create scripts to identify and locate PII for a data subject and where PII is due to expire, incorporating encryption.
11. An SQL query would have been run only over the Production database to locate PII, so other databases and data formats throughout the Hybrid cloud environment would have been overlooked. 12. Expired PII may be deactivated in some way, but unlikely to have been identified with a view to erasure.
11. The super user has administrative authority to all PII on every device within the Hybrid environment and because of this, the process cannot be fully automated. A manual log on is required to every device and each script run separately. All PII can now be identified for a data subject and PII where the contract has expired.

8.
Use an appropriate tool (If using PowerShell, as in this instance, ensure it has been configured on each device).
13. No automation tool or script to identify and report the location of PII for a data subject or PII that has expired.
12. With all devices configured, PowerShell can be run to locate and retrieve the location of PII for both a data subject and expired PII. 14. The backup process of nightly, weekly and monthly would erase the PII, therefore procedures put in place that the monthly is run on 28th day or nearest weekend to 28th day of the month. 15. Backups and archives may be stored off-site and on tapes. These must all be encrypted and stored in a secure environment and where feasible erased. 16. All PII would be securely deleted from every device within the Hybrid cloud.
11. Document steps carried out. 18. No processes or procedures in place to document.

17.
Steps undertaken would be documented so they can be used to demonstrate that reasonable measures were taken to be compliant with GDPR Article 17, "Right to Erasure". Table 3 shows the structured and semi-structured data was found but some of the unstructured was not. And the data that was found was only picked up if the spelling was exact. Regarding the encrypted database, the encrypted password is required to gain access. PowerShell does have special coding for .zip which was not included in the test scripts ran, so .zip is not a true reflection. PII was not picked up from the recycle bin or Snapshots, so further investigation and analysis needs to be done on this. Yes (access database) No (.zip) n/a n/a Table 4 shows that all structured, semi-structured and most of the unstructured data was retrieved but .xlsx documents were omitted. Figures 16 and 17 show the file size per year for Server1 and Server2 highlighting the amount of possible obsolete data retained by the organisation.    Figures 18 and 19 show selected document types for Server1 and Server2 highlighting the amount of structured, semi-structured and unstructured data retained and highlights documents that are zipped. To access .zip special coding is required. This is the same for .pdf. So, having an overview of your data landscape can provide some insight to the type of data you need to deal with.    Figures 18 and 19 show selected document types for Server1 and Server2 highlighting the amount of structured, semi-structured and unstructured data retained and highlights documents that are zipped. To access .zip special coding is required. This is the same for .pdf. So, having an overview of your data landscape can provide some insight to the type of data you need to deal with. Total Figures 18 and 19 show selected document types for Server1 and Server2 highlighting the amount of structured, semi-structured and unstructured data retained and highlights documents that are zipped. To access .zip special coding is required. This is the same for .pdf. So, having an overview of your data landscape can provide some insight to the type of data you need to deal with.

Evaluation
Visual representation of all files a data subject named PhilomenaAnnKelly and PHILOMENAANNKELLY1 on Server1 can be seen in Figure 20 and administrator on Server2 in Figure 21.  Visual representation of all files a data subject named PhilomenaAnnKelly and PHILOMENAANNKELLY1 on Server1 can be seen in Figure 20 and administrator on Server2 in Figure 21.     Visual representation of all files a data subject named PhilomenaAnnKelly and PHILOMENAANNKELLY1 on Server1 can be seen in Figure 20 and administrator on Server2 in Figure 21.     Visual representation of all files a data subject named PhilomenaAnnKelly and PHILOMENAANNKELLY1 on Server1 can be seen in Figure 20 and administrator on Server2 in Figure 21.    GDPR focuses mainly on data privacy, rather than data security with only 8 out of its 99 Articles dealing explicitly with technology and tools. Article 17, Right to Erasure ('Right to be Forgotten') covers both as you need to pass through security to access the PII, yet whilst doing this, data privacy must be kept intact both at rest and in transit. Data privacy and data security tend to overlap and get confusing. GDPR focuses mainly on data privacy, rather than data security with only 8 out of its 99 Articles dealing explicitly with technology and tools. Article 17, Right to Erasure ('Right to be Forgotten') covers both as you need to pass through security to access the PII, yet whilst doing this, data privacy must be kept intact both at rest and in transit. Data privacy and data security tend to overlap and get confusing. We show that compliance with GDPR's Article 17 Right to Erasure ('Right to be Forgotten') is achievable given a set of recommendations. Listed in Table 5 are the top practical guidelines resulting from the work carried out aimed specifically for existing PII throughout an ad-hoc hybrid cloud environment of a small business to demonstrate reasonable measures for the Article 17 of GDPR, Right to Erasure ('Right to be Forgotten') compliance. This work confirms that compliance with GDPR's Article 17, Right to Erasure ('Right to be Forgotten') is achievable in a Hybrid cloud storage environment, if the recommendations are considered.  Figure 21. Data held on Server 2 for administrator.
We show that compliance with GDPR's Article 17 Right to Erasure ('Right to be Forgotten') is achievable given a set of recommendations. Listed in Table 5 are the top practical guidelines resulting from the work carried out aimed specifically for existing PII throughout an ad-hoc hybrid cloud environment of a small business to demonstrate reasonable measures for the Article 17 of GDPR, Right to Erasure ('Right to be Forgotten') compliance. This work confirms that compliance with GDPR's Article 17, Right to Erasure ('Right to be Forgotten') is achievable in a Hybrid cloud storage environment, if the recommendations are considered. Allocate a dedicated management machine configured with trusted hosts, file containing encrypted username and passwords, to connect securely with all devices.

Encryption 2
Ensure PII is encrypted both at rest and in transit.

3
Ensure event logs and audit trails are in place, to demonstrate reasonable measures were taken to be Right to Erasure ('Right to be Forgotten').

Super User: Access and Authority 4
Create a super user account with administrative authority enabling full access to all PII on every device (physical and virtual) within the hybrid environment.

5
Have access to passwords, cryptographic keys etc. 6 Have access to the names and IP addresses of all devices in the hybrid cloud.

Personally Identifiable Information (PII): 7
Identify all the PII that is currently retained. 8 Create scripts to identify and locate PII for a data subject and where PII is due to expire, incorporating encryption.

Task Automation and Configuration Management Tool 9
Use an appropriate tool. (If using PowerShell, as in this instance, ensure it has been configured on each device).

Security: Firewalls 10
Make sure the Firewall has been configured with the relevant Inbound and Outbound rules for specific ports and IP addresses and Access Control Lists (ACLs) are up and running. Erasure:

11
Securely delete relevant PII records (remember to empty recycle bins, clear history and remove from backups and archives, which may be stored off-site and on tape if feasible). 14 Document steps carried out.

Conclusions
This research set out to examine the GDPR Article 17 "Right to Erasure" and the challenges a small organisation may face whilst implementing this right within a Hybrid cloud storage environment. It was found that compliance with GDPR's Article 17 "Right to Erasure" is achievable in a Hybrid cloud environment by following a set of recommendations. Under this legislation, an organisation must be able to erase PII both following receipt of a valid erasure request from a data subject and when the PII contract has expired, without undue delay. How the data is erased and the time it may take to do so is beyond the scope of this research, however the research does provide recommendations that relate to the identification and location of PII for a data subject and expired PII. This aids in the incorporation of data privacy to demonstrate that reasonable measures were taken to be compliant with the "Right to Erasure" legislation. GDPR's Article 17, "Right to Erasure" covers both data security and data privacy. Authentication is required to gain access to the PII and encryption must be used to ensure data privacy is kept intact both at rest and in transit. PCI-DSS Version 3.2.1, dated May 2018, ISO 27001/2:2013 and ISO/IEC 27018:2014 standards were consulted regarding the data privacy. We explored the possibility for a small organisation with an ad-hoc Hybrid cloud storage environment to demonstrate it can achieve "Right to Erasure" compliance, without having to purchase expensive third-party tools. Subsequently, the research conducted proved that this can be achieved by use of free, non-intrusive tools such as PowerShell. PowerShell enables encryption and decryption, so additional elements of data privacy required by GDPR and other best standard practice documentation can also be adhered to while carrying out this task. Usernames and passwords for the various devices can be encrypted and stored in a local file and can only be decrypted by the user who encrypted them and only on that device. The use of specified trusted hosts adds to securing access. Files can be encrypted using an AES algorithm, so the data retrieved using the script may be encrypted.
Our tests which were carried out on a selection of structured, semi-structured and unstructured data formats stored in a variety of locations within the Hybrid cloud environment illustrate that it is possible to securely connect to each component within a Hybrid cloud environment using a non-intrusive free tool like PowerShell and execute scripts to identify, locate and report PII for a data subject and PII that has expired. The structured data format was quite straight forward when identifying and locating a data subject, however, identifying and locating PII that had expired was initially more challenging. This was attributed to the lack of auditing configured on the database, however, once this was implemented it was easy to identify the date using the last date the data was accessed. Implementing a Super User account allowed for event logs and audit trails (if not already in place), to be put in place to monitor this privileged account and to be used to demonstrate compliance. Since the Super User has such high privilege within the hybrid cloud, it is considered best practice that this process is not fully automated. Instead it is mandatory that manual logins/authentication are carried out. Although laborious, this is eased by encrypting the usernames and passwords storing them within an encrypted file that only can be accessed by the user that created it and from the machine in which it was created. Full administrative access is required to enable full access to all areas within the hybrid as this is required to avoid access denied. Some devices needed a device name whilst others required an IP address, but either way they must be added to the trusted host on the management laptop, and the management laptop must be added as a trusted host on each device. Passwords and cryptographic keys are required for scripts to access encrypted folders, files and databases. All connections to the relevant devices must be closed as soon as the task is complete and execution policy set back to restricted. It is crucial for organisations to identify all the PII currently retained, and it is best to start with the high-risk areas of the business such as processes dealing with payments, or processes involving third parties. All scripts used must incorporate encryption and PowerShell enables this. PowerShell must be configured on each device to enable remoting, but this must be set back to restricted once task is complete. This is another security feature of PowerShell. Firewalls must be configured to secure specific ports and only allow the management IP address. ACL must be set to allow Super User through.
Once all the relevant records have been retrieved and reviewed, they must be securely deleted promptly. Although the time element and the actual erasure was beyond the scope of this research, secure deletion would be required to include clearing both the history and the logs, deletion of the output files from the scripts, and permanent deletion via emptying of the recycle bin. The PII must be deleted from the manual and automated system backups also. There are various types of backups, however once the PII is deleted from the production database, the nightly backup will update, then the weekly, and finally the monthly. The monthly could be run on the 28th day of the month to ensure that the PII has been removed within 30 days. SQL databases offer various options for managing backups. Data privacy must come from the top down, as even with the best security in place, if employees are not trained in data privacy and correct management of PII, then the organisation will still be a high risk. Technology alone cannot ensure data privacy, privacy policies and procedures must be put in place and enforced. GDPR is in place in a bid to address this by placing strict responsibilities and obligations on organisations who collect, store and process PII. To enforce this, non-compliant organisations face significant fines for breaches. Whilst is it not possible to guarantee complete assurance that all PII has been identified and removed, these recommendations can be referred to by a small organisation running a hybrid cloud to demonstrate that they have taken reasonable measures to be compliant with the "Right to Erasure". Therefore, this research has identified that compliance with GDPR's Article 17 "Right to Erasure" is achievable in a Hybrid cloud storage environment.
Author Contributions: M.K. conducted the primary research. E.F. designed the framework and supervised the research. K.C. provided consultancy and helped shaped the final paper. All authors have read and agreed to the published version of the manuscript.