Electronic Capturing of Genetic and Molecular Data in Modern Clinical Trials

Healthcare data continues to explode in size and complexity and collection and harmonization of disparate clinical, molecular, patient research and biospecimen data is key to effective research, but presents challenges in terms of access, organization, security and compliance. When designing an experimental clinical study, it is important to plan how genetic and molecular data will be collected and recorded in the course of the study. With the recent advent of internet of things (IoT) the direct recording of data could be widened to mobile devices such as wearable sensors which could capture biomedical information directly from patients’ body, and to other type of remote devices internet connected such as digital scales for body fat and weight monitoring. Despite the diffusion of the electronic data capture, paper collection data sheets are still widely employed. Electronic data capture (EDC) employs different technologies and methods for data collection. For these reasons EDC (electronic data capture) and eCRF (case report form) are becoming more widespread. Their production is complex and needs to comply with strict regulations both in Europe and in the United States, which comprise system validation, security policy and data backup.


Introduction
The access to clinical data for research is at its infancy, to say the best, but, likely, will move very quickly. Researchers must be aware of the difference between data collected as part of routine clinical care or from a clinical trial which by definition has to follow a rigid protocol of data acquisition. Therefore, data generated by traditional clinical trials are obtained using less biases as standardized definitions and the completeness of collection is carefully documented.
Healthcare data continues to explode in size and complexity and their collection and harmonization from disparate clinical, biological, molecular, research programs is key to future effective research, but also a challenge in terms of quality, access, organization, security and compliance.
When designing a clinical study, it is important to plan how genetic and molecular data will be generated, collected and recorded. Several actors are involved in this: principal investigator, sub-investigator and data management staff. Traditionally this is obtained also on paper, using data collection forms (Case Report Form, CRF) but today can be obtained in electronic form (electronic Case Report Form, eCRF), even handheld by instruments such as mobile phones or tablets. The same is not always true for data generated retrospectively or from large registry, whatever the scope. Another modern source of data is represented by direct output into research database from different devices or instruments as electrocardiogram (ECG), echocardiographer, magnetic resonance imaging (MRI) or tomography results (Direct Data Capture, DDC). With the recent advent of internet of things (IoT), direct continuous recording of data could be provided to mobile devices from wearable sensors or remote device which could capture biomedical information directly from the body of the patients.
Electronic data capture (EDC) employs different technologies and methods for data collection ranging from notebook computer, internet-based systems, PDA and tablet PC, each one having different strength or weaknesses that need to be evaluated to best match the study requirement. Despite the diffusion of these option of electronic data capture, paper collection is still widely employed both for tradition and for legal purposes. To avoid too many copies or faxes, technologies such as Optical Character Recognition (OCR) enable computers to "read" hand written data and automatically insert them into a database, thus overcoming the disadvantages of paper sheets, a relevant amount of paper to store, space limitation and incorrect data entries.
EDC shows several advantages toward paper collection data sheets: (i) It supports careful real time data quality control, (ii) It provides a suitable environment for remote data entry and (iii) It is an economical and practical approach for studies sharing and auditing. For these reasons EDC and eCRF are becoming rather popular. However, there is the need to comply with strict regulations both in Europe and in the USA, which comprise system validation, security policy and data backup [1].

Genetic and Molecular Medicine Data Capture
With the advent of genetic and genomics a considerable amount of data collected from laboratory and clinical procedures is stored for translational downstream analysis. Public repository such as the Cancer Genome Atlas (TCGA) for Cancer Genetic and Molecular Medicine or Haplotype Map (HAPMAP) for human polymorphism Page 2 of 2 data collection have attracted interest of investigators due to the volume of information contained that could be useful for future discovery provided by interpretation based on several type of algorithms that span from classic statistical methods to recently adopted machine learning techniques [2,3]. Most of these data are collected from clinical prospective cohort studies and therefore are subjected to regulatory requirements. This is good as these type of analytical approach needs high quality data and the use of robust electronic data capture software capable to collect and store large amount of clinical, laboratory and instrumentation data (including wearable device). There are several commercial and open source projects that try to address these issues. Each of them has to be validating toward ICH GCP [4] and FDA 21 CRF Part 11 [5]. The Clinical Data Interchange Standards Consortium (CDISC) establishes the standards for Clinical Data Acquisition and Harmonization (CDASH) across studies and sponsors with the aim to provide clear traceability in order to, delivering more transparency to regulators and data reviewers [6].

System Validation
Validation of electronic systems is mandatory to verify the accuracy of the data. There are several requirements for technical validation. Briefly, a system must have a control track (audit trail), which means that any changes must be recorded and traceable electronically, a protection against unauthorized access and a regular back up. Data collected in form of records must be accessed for the duration of the clinical trial and records must be available for inspection in both electronic and human readable form. Records have to be protected from unauthorized access and must be un-modifiable during long term storage. The system should carry out authority check to evaluate the permission to perform operations or procedure and it should take care of data validity and operational instruction inputs. Electronic signatures must be unique and individual, and used only by the genuine users. Signature manifestations requires that electronic signed records display printed name of the signer, the date and time when the electronic signature was executed, and the meaning (such as review, approval) associated with the signature itself. Standard operating procedures (SOP) have to be provided and followed, these must certify that the person who develop, maintain, or use electronic record or is involved in signature have the training to perform the assigned tasks [7][8][9].

Conclusion
The importance of high-quality genetics and molecular medicine data can be assessed. In the post-genomic era, biomedical data are increasing in volume, velocity and variety. The quality of the data is a key criterion to establish if a data set could be employed in downstream analysis. Exploiting the EDC systems capabilities in terms of input validation, long term data availability, and compliance with regulatory requirements provide a benefit for the quality of the research. The employment of an EDC system could reduce the time needed for data collection and the data input error rate, promoting the adoption of Good Clinical Data Management Practice (GCDMP) as required by ALCOA G × P principles.