Big Data and Visual Analytics in Health and Medicine: From Pipe Dream to Reality

The increasing move to electronic data platforms for the management of health information is creating an untapped resource with the power to change health care. These Big Data open up possibilities for better quality health care, and better and faster clinical research. A key problem with Big Data is understanding the information quickly, so by applying a Visual Analytics approach, the initially overwhelming scale of Big Data becomes a valuable asset. Interactive visualization permits everyone to understand large, multisource, variable-type, and time varying data.


Introduction
The increasing move to electronic data platforms for the management of health information is creating an untapped resource with the power to change health care. These Big Data open up possibilities for better quality health care, and better and faster clinical research. A key problem with Big Data is understanding the information quickly, so by applying a Visual Analytics approach, the initially overwhelming scale of Big Data becomes a valuable asset. Interactive visualization permits everyone to understand large, multisource, variable-type, and time varying data.
Big Data have been defined as large datasets from a variety of sources and data types (such as numeric, text-based and imaging data) that cannot be managed or processed using standard software tools within a reasonable time [1]. Practically, it refers to datasets of a sufficient volume, variety, and velocity so as to allow discovery of new insights or new forms of value that could not be found in smaller datasets [2,3]. Visual Analytics supports Big Data by providing interactive visualizations that allow people to navigate these datasets. Visual Analytics has been defined as "the science of analytical reasoning facilitated by interactive visual interfaces [4]." Visual Analytics is more than just visualization of the data; it is an approach that combines visualization, human factors and data analysis [5]. Big Data are more powerful when combined with Visual Analytics, as the synthesized information that Big Data provides can be understood more quickly. This increases utility of Big Data for decision-making processes that need to be made with limited time in real-time. Although there is time for sober reflection after clinical decisions are made, quality cares provided through correct clinical decisions are influenced by information in real-time.
Both of these emerging fields will expand current discussion on Big Data in health, which have described the potential to generate new clinical knowledge by analyzing unstructured text-based data, assisting with knowledge dissemination by using data-driven decision support tools, translating personalized medicine (e.g. genomics) into clinical practice, and delivering information directly to patients [6].
Both Big Data and Visual Analytics can contribute significantly to improving the quality of health care. A key tenet in quality improvement (QI) is to measure the process where improvement is sought, which can often be a single measure such as time for a patient to receive a specific treatment [7]. However, each data point contains a large amount of supporting information (i.e. a wide dataset), which can include: patient demographic information; patient co-morbidities; time of day that the treatment happened; the specific diagnoses; text-based physician notes; the physician(s) and healthcare professionals that were involved in the care; the prescriptions that the patient was taking; imaging data etc. This supporting information can account for a proportion of the variability in the measure that is being improved. Through interactive visualization of the measure, a QI consultant or analyst can explore the factors associated with this variability and gain a deeper understanding of possible causality. For example, patients with two or more co-morbidities may have delays in treatment due to extended times needed to ensure a proper diagnosis, or patients that arrive for treatment during off-hours may have delays in treatment because of availability of certain specialists. By applying Big Data and Visual Analytics research, various individuals can explore this wide dataset to understand factors influencing quality of care. Generally, these disciplines can allow for: 1) the development of better QI methods; 2) implementation of more meaningful system improvements by taking out the guess work in the Model for Improvement's plan-do-study-act iterative cycle [7] (or sixsigma's define-measure-analyze-improve-control process) [8]; and 3) the ability to engineer sustainable improvements achieving tightly controlled excellence in health care despite variation in patients and care settings.
Similarly, Big Data and Visual Analytics can significantly contribute to clinical research by using de-identified health data and pragmatic clinical trials [9]. Randomized Controlled Trials (RCTs) provide the highest level of evidence when implementing clinical research into clinical practice [10]. However, the cost of running RCTs is often prohibitive, and new, more pragmatic solutions to traditional RCTs are being discussed, which include randomized registry trials [11] and pragmatic RCTs [12]. By using a randomized registry approach, a more pragmatic, generalizable and cost effective approach to clinical research can be undertaken. The randomized registry trial as a backbone offers the potential to quickly recruit patients, test interventions, and translate successes into clinical best practice for acute stroke care. Novel software solutions can help to facilitate consent and randomization into clinical trials by working with researchers to better understand the time constraints, ethical concerns, and privacy issues. Furthermore, the electronic visualizations of the data can ensure appropriate anonymization required by the trial protocols. Visual Analytics can provide the ability to dynamically interact with the data and seamlessly provide ways to move between meaningful overviews and patient specific information as needed by the researchers.
Despite the potential, creation of the one-source-of-truth registry file in a real-time and dynamic manner holds many challenges. Additionally, there are challenges to using these large datasets to improve the quality of health care and providing cost effective ways of running clinical research. Principally, the data often resides across multiple data sources, which create difficulties in having fully linked big data. Furthermore, it is often difficult to understand data quality and fidelity, which may lead to misleading results. By creating secure linked health information from all necessary sources, we can start to learn about the causes of poor quality and high variation in our complex health system [13] and be able to create solutions for the entire system [14].
Despite these challenges, there are current examples of Big Data solutions that are being used for QI in healthcare and for clinical research, which provide promise of moving forward. The Get With the Guideline Registry (GWTG) is a very large American registry created by the American Heart Association/American Stroke Association involving over one thousand sites [15]. This registry has shown good data quality [16], and spawned important QI research in stroke [17], coronary heart disease [18], and heart failure [19]. This example shows how large datasets from diverse organizations can be developed when there is a will, leading to improved outcomes for patients. On the clinical research side, the randomized registry trial has been successfully applied in Scandanavia with 5000 patients [20]. They used an existing Swedish Coronary Angiography and Angioplasty registry to obtain all baseline data, where numerous data points were collected directly in the catheterization laboratory via a web interface. The extra work of randomization was done through a small randomization module within the registry. This type of registry trial can allow for reduction in the cost to run an RCT, while proving to be more efficient and providing better data quality by having a centralized repository. Similarly, the Resuscitation Outcomes Consortium has set-up a registry and data collection system in routine paramedic care for their pre-hospital ED trials.
The new world of Big Data is changing the way collaboration occurs in health research. Multi-institution, industry, multi-centre and international collaborations often come together to tap into the enormous potential of Big Data in health. Several groups are working toward this vision across Canada. "Open data" and "data sharing" models are being increasingly developed. In the province of Alberta, for example, initiatives are afoot to develop province-wide, consolidated health data repositories and high-performance, cloud and agile computing platforms; additionally expertise is being developed to explore specific health questions using Big Data and Visual Analytics. Similarly, groups are working to create a central clinical and imaging registry for stroke. Alberta is in a unique position in Canada, as it has a single health authority and single health data custodian for the entire province. A clinical registry that is able to securely fill data from the electronic medical records and link to the imaging registry to obtain a full record for each stroke patient will potentially provide a backbone for quality improvement and randomized registry trials that are running across the province. Examples in other Canadian provinces include: the Southern Ontario Smart Computing Innovation Platform, a multi-university partnership with a large focus on health; and the Atlantic University Partnership on Analytic Skills, an initiative focusing largely on skills development for the novel field of Big Data and Visual Analytics.
Big Data combined with Visual Analytics provide promise for improving the delivery of health services and changing how clinical research is performed. Despite the challenges of residing across multiple data sources and unknown data quality, there are examples of Big Data registries for improving health outcomes and clinical trials. Furthermore, Canadian examples provide a glimpse into the future of Big Data and Visual Analytics to improve health outcomes and develop better and cheaper ways of running clinical trials.