Overview of Pharmacoepidemiological Databases in the Assessment of Medicines Under Real-Life Conditions

The raison d’etre of these studies is to bridge the gap between the information generated by clinical pre-marketing trials and the real-world drug usage, since drugs often do not perform as well in routine clinical practice as in clinical trials. Clinical trials may not provide an accurate picture of drug effects. At the time of authorization, we have evidence from clinical trials which demonstrate efficacy, but only for a specific indication and only within the original test population. At this stage, we have evidence of adverse reactions, but only the most common ones (Black, 1996; Strom, 2005a).


Introduction
Pharmacoepidemiology studies the use and effects, both beneficial and adverse, of medicines in large numbers of people.It applies the methods of Epidemiology to the area of Clinical Pharmacology (Strom, 2005a).This field seeks to further the knowledge and science underlying post marketing drug surveillance (studies conducted after a medicine has been released into the market).
The raison d'être of these studies is to bridge the gap between the information generated by clinical pre-marketing trials and the real-world drug usage, since drugs often do not perform as well in routine clinical practice as in clinical trials.Clinical trials may not provide an accurate picture of drug effects.At the time of authorization, we have evidence from clinical trials which demonstrate efficacy, but only for a specific indication and only within the original test population.At this stage, we have evidence of adverse reactions, but only the most common ones (Black, 1996;Strom, 2005a).
The known limitations of clinical trials (e.g.selected study population, defined by strict inclusion and exclusion criteria; short duration of time; small sample size; conducted in selected sites which are typically better equipped than routine care facilities) have led to a number of well-described "unknowns" -primarily the effectiveness of the medicine in normal clinical practice and the full safety profile (Roher, 2008;Montori et al, 2005;Zimmerman et al, 2002).
The current regulatory process creates an evidence-free zone at the time of launch of new medicines and often decisions are taken under conditions of uncertainty.It is important to note that uncertainty about safety may be more common, as most studies for regulatory approval are powered on demonstrating efficacy rather than safety (Eichler et al, 2008;van Staa et al, 2008).
Eichler and colleagues argued that this efficacy-effectiveness gap is, in most cases, a problem of variability in drug response and described both biological and behavioural as sources of Overview of Pharmacoepidemiological Databases in the Assessment of Medicines Under Real-Life Conditions 133 Moreover, an understanding of social, political, cultural and historical settings is crucial (Shatin, 2001).Lack of awareness of the limitations and failures to understand methodological challenges arising in the use of these databases can lead to the selection of inappropriate study methods and hinder the interpretation of results.Unequivocally, when conducting pharmacoepidemiology research through databases, we should bear in mind Brian Strom's quote: "Databases should not distract us from sound methodological and clinical thinking" (Strom, 2004).
To sum up, to get the complete picture and to (try to) understand everything about a drug, we must do both clinical trials and observational studies.Over time, this was clearly a lesson learned (Vandenbroucke, 2008) but we should do it with rigor and humility (Avorn, 2007).

Automated databases in pharmacoepidemiology research
Identifying a good research question is arguably the most important step in the research process.It must be clearly and precisely defined prior to the design and implementation of a study as it forms the foundation for the entire project (Harpe, 2009;Vanderbrouke, 2002;Hulley, 2007).After the research question and study design are careful constructed, a framework should be put in place which includes: selecting the database of interest, selecting the outcome, exposure and other variables of interest, understanding the limitations of the data source, and quality assessment methodology (Harpe, 2009).
Pharmacoepidemiology research studies may involve either data collected prospectively for the purpose of the particular study (de novo data collection), i.e. primary data, or data that were already collected for another purpose -as part of administrative records or patient health care -, which is called secondary data (Harpe, 2011).
Although pharmacoepidemiology uses all epidemiological study designs and sources, in recent decades there has been an enormous growth in the use of secondary data.'So called' automated large healthcare databases, especially in North America and Europe, have proven to be a rich resource for pharmacoepidemiology and health services research (Hemmerlgarn et al, 1994;Arana et al, 2004).
These automated databases are regularly used in a variety of settings to study the use and outcomes of therapeutics and often meet the need for a cost-effective and efficient means of conducting post-authorization surveillance studies.Their size allows the study of infrequent drug effects and their longer follow-up times and representativeness, in terms of routine clinical practice, make it possible to study real-world effectiveness, safety and utilization patterns.
According to Brian Strom, to meet the needs of pharmacoepidemiology an ideal database would include records from all inpatient and outpatient care, emergency care, all laboratory and radiological tests, all both prescribed and non-prescribed medications.The population covered would be large enough to detect rare events and would be stable over its lifetime, from birth to death.Moreover, it would be kept updated and have information on potential confounders variables, such as smoking status, alcohol consumption, etc. (Strom, 2005b).

www.intechopen.com
Epidemiology -Current Perspectives on Research and Practice 134 However, no single existing database is ideal.The information necessary for a study is often stored in separate databases.In order to gather all the information, it is often necessary to identify the same participants across databases, which can be complex (Florentinus et al, 2006).In 1946, Dunn stated that in some way, each person in the world creates a book of their life, which starts with birth and ends with death, and has pages that are records that represent the principal events during the course of each person's life.Record linkage was the name given by Dunn to the process of assembling the pages of this book into a volume (Dunn, 1946).Applying this concept to the field of pharmacoepidemiology, record linkage can be defined "as a method for bringing together the information contained in two or more records -e.g., in different sets of medical charts, and in vital records such as birth and death certificates -and a procedure that each individual is identified and counted only once.This procedure incorporates a unique identifying system such as a personal identification number" (Porta et al, 2008).
Although, this concept of assembling unique health information about individuals from various sources is far from new -it has been around for more than 150 years (Rawson & Shatin, 2008), this is not an easy task.It requires high qualified human resources, such as system engineers and data managers, who should have the capacity, flexibility and familiarity with the knowledge in this demanding medical field (Takahashi et al, 2011).
On the other hand, in the process of analyzing the data source, a number of factors must be taken into consideration, including the extensiveness and depth of data, the quality of database, the population covered, and the duration of information contained in the databases (Berger et al, 2009).It is important to be aware of the drawbacks and limitations of databases discussed below.
Databases can be broadly divided into two general categories: administrative databases, which include transactions primarily to achieve administrative purposes, such as claims for reimbursement from insurance companies; and electronic health medical records, which include records maintained for the management of patients' clinical care (Strom, 2005b).

Administrative databases
In U.S.A. and Canada, administrative health databases were initially developed to administer payments to healthcare providers in nationally funded healthcare systems or managed care organizations (Suissa & Garbe, 2007).They were designed for billing and record-keeping purposes, where information is recorded as a by-product of financial transactions, and not for research.Limited information is required for billing, including type of insurance coverage, dates of medical service and associated diagnoses (ambulatory and hospitalizations information), tests performed and prescriptions dispensed by community pharmacies.These databases usually consist of patient-level information from two or more separate files that can be linked via a unique patient identifier (Hennessy, 2006;Rawson, 2009).Typically, researchers have to submit a protocol to an ethics review committee and they receive only the specific data to investigate a particular research question in exchange for a fee charged for time needed to extract data from the database.
Group Health Cooperative (U.S.A.), Medicaid databases (U.S.A.) and Health Services Databases in Saskatchewan (Canada) are just few examples of this type of database.Detailed information about these and other administrative health databases has been described previously (Strom, 2005a).
Overview of Pharmacoepidemiological Databases in the Assessment of Medicines Under Real-Life Conditions 135 • Electronic health medical records and linkage systems (non-administrative) In contrast to administrative databases, in electronic medical records, data are recorded as part of clinical patient care and not for billing purposes.These type of databases normally consist of data entered by general practitioners (GP) (often the gatekeeper of the healthcare system) into their practice computers and are maintained primarily for documenting the patient's conditions and treatments.As medical practices increasingly became electronic (over time computers are replacing paper medical records as the primary medical record), a unique opportunity for pharmacoepidemiology research was opened (Strom, 2005b).In practical terms, physicians maintain records of all visits and events: diagnoses (outpatient conditions and procedures diagnosed and performed by GP, respectively; conditions diagnosed by outpatient specialist, information pertaining hospital admissions, including hospital diagnoses), medical history, prescriptions issued, laboratory tests (ordered and their results).In comparison to administrative databases, medical records databases usually include much more detailed information on their patient files, such as alcohol consumption, smoking status, height and weight, although this may be missing in many patients.It is worth noting here that the existence of these types of databases is more frequent in Europe.
The General Practice Research Database (GPRD) is generally considered to be the largest medical records database in routine use for medical investigation (it is representative of the United Kingdom (UK) and has population-based data on over 9 million patients, with over 44 million years of follow-up time) and the one which has been used most extensively for published research in the pharmacoepidemiology field (Gelfand et al, 2005).This database was created in June 1987 as the Value Added Medical Products (VAMP) research databank.The salient feature of GPRD is its comprehensive nature, since it collects information on demographics, medical diagnoses, prescriptions, referrals to hospitals, smoking status, immunizations, weight and height, and for a growing number of patients also laboratory results (Walley & Mantgani, 1997).
A 'proof of concept' of the feasibility and utility of implementing cluster randomized trials utilizing electronic patient records in GPRD was recently provided.An application of electronic patient records to the evaluation of health interventions, including their health impacts and effectiveness, was developed and will be tested on antibiotic prescribing for acute respiratory infection.Results from this study will provide guidance and methodological evidence concerning the use of electronic patient records and databases for implementing cluster randomized trials in primary care (Gulliford et al, 2011).
There are few published examples of randomized database studies, but this design could become more common in the near future, since they attempt to combine the advantages of randomization and observational studies (Sturkenboom, 2008).
GPRD is derived from medical records whereas other medical record databases are often derived from pharmacy dispensing records.In the Netherlands, there is an automated pharmacy record linkage called PHARMO that was developed in the early 1990s by Herings and Stricker.The PHARMO medical record linkage system is a population-based data system that includes the drug-dispensing records from community pharmacies and hospital discharge records of community-dwelling inhabitants in the Netherlands.The drug-dispensing histories are linked to the hospital discharge records of the same patient, using a probabilistic algorithm (based on characteristics such as date of birth, gender, and a code for the GP) that is comparable to unique personal identifier system (Goettsch et al, 2004;Pouwels et al, 2011).The Dutch health care system has strong regulatory and reimbursement incentives for Dutch patients to frequent a single GP (gate keeper system) and a single pharmacy which constitutes an important feature (Leufkens & Urquhart, 2005).
Access to data must comply with certain rules and conditions.For example, to access GPRD adequate computer software and hardware are required, as well as experienced data managers.Moreover, access to data is often not free of charge and the study protocol's approval by the Scientific and Ethical Advisory Board is mandatory (Gelfand et al, 2005).
Detailed overviews with comprehensive description of strengths and weaknesses of these and similar databases are available elsewhere (Strom, 2005a).
Health care databases throughout the world diverge noticeably with regard to their representativeness of population, the range and detail of information they contain, their data quality and completeness and their capability to link with other sources (Schneeweiss & Avorn, 2005), as discussed below.Indeed, the minimum requirement for drug utilization research is the availability of a pharmacy dispensing database, which allows at least a descriptive analysis of drug use (Geleedst-De et al, 2010;Elseviers et al, 2007).Additionally, in specific situations, even without diagnostic information, it can be used to assess the association between drug use and adverse effects in a prescription symmetry design (Hallas, 2005).One example of such a database is the Portuguese pharmacy sales (medicines, health products and pharmacy services) database of the National Association of Pharmacies, which is a nationwide database with representative drug dispensing data from ambulatory care (79% of pharmacies) at a regional level (hmR/CEFAR Pharmacy Sales Information System) (Torre et al, 2011).
In terms of complexity, combining databases from different countries, although a challenging task, is also possible.Both Food and Drug Administration (FDA) and European Medicines Agency (EMA) have issued similarity initiatives to develop innovative methodologies for drug safety monitoring based on analysis of large databases.Recently, in Europe, under the EU-ADR umbrella project, a methodology was developed and tested that enabled the combining of data from electronic health record databases of various countries and types (medical records, administrative registries, record-linkage databases).This represents an enormous step in large-scale drug monitoring, namely an early-detection of safety signals field (Colomba et al, 2011) and on detection of rare drug-associated outcomes (García Rodriguez et al, 2010).In this area of cooperation, formal collaborative networks of centres active in pharmacoepidemiology and pharmacovigilance are rapidly changing the landscape of drug post-authorization studies in Europe and will have promising results in the near future (Blake et al, 2011;ENCePP, 2011).
Overview of Pharmacoepidemiological Databases in the Assessment of Medicines Under Real-Life Conditions 137

Exposure and outcome: Definition and measurement
Once the research question is defined, it is necessary to consider what information is required to answer it.
Outcomes of interest in pharmacoepidemiology can include diseases or conditions, medical procedures, laboratory tests results, or the use of particular medication, including medication adherence and persistence (Harpe, 2011).
The use of databases, either administrative or electronic medical records, to record diagnosis information usually requires the diagnosis to be transferred using a formalized coding system, frequently the International Classification of Diseases, Injuries and Causes of Death (ICD) or other classification system (e.g.OXMIS).First of all, however, it should be noted that the process of diagnose is fundamentally probabilistic and it is usually a matter of making choices amongst alternatives in the face of uncertainty (Rawson & Shatin, 2008).
Once more, with respect to ICD classification, it is important to note that ICD was primarily designed to record mortality and morbidity information, therefore, often signs, symptoms and less specific diagnoses are poorly recorded.When researchers use databases without detailed access to medical data, some diseases are impossible to study (outcomes poorly defined by ICD-9-CM), such as Stevens-Johnson Syndrome (Strom, 2001).
If hospitalization occurs, the discharge diagnosis is the primary source of information of a particular disease.One of the most basic outcomes of interest in pharmacoepidemiology is mortality.From the inpatient viewpoint, this can be easily identified from the discharge files.Nonetheless, from the outpatient standpoint, identifying mortality may require more effort;patient identity must be known and cross-checked with vital records (Harpe, 2009).
The number of diagnoses may diverge from institution to institution, with one code being identified as primary or principal diagnosis and all the other diagnoses (if extant) identified as secondary diagnosis.Because payments and reimbursements are linked to diagnosis codes, there may be a predisposition to include codes, within certain boundaries, that could increase the amount of money paid or reimbursed for a given diagnosis.Researchers should be aware that conditions that are likely to get a higher reimbursement or payment are more likely to be included as a discharge diagnosis (Harpe, 2011).
Here it is important to emphasize a crucial difference between the quality of outpatient diagnoses and inpatients diagnoses.With respect to claims data, if a patient goes to a hospital, the hospital charges for the care using an ICD (ICD-9-CM) code but also a Diagnosis Related Group (DGR).Hospitals employ people to code diagnosis for reimbursement and inpatient diagnosis are scrutinized for errors.Conversely, outpatient diagnoses are assigned by the practitioners themselves.In this case reimbursement does not depend on actual diagnosis, but on procedure codes used.Since there is no incentive to use ICD-9-CM diagnosis codes or to be careful and comprehensive -data are not audited in respect to this issue -outpatient diagnoses are the weakest link in claims databases (Strom, 2001;West et al, 2005).Other questions regarding data quality will be discussed later.
Procedures (e.g.inpatient procedures or those that are billed through the hospital or facility) and the utilization of services (e.g.counting the number of days in the hospital) may be also outcomes of interest.
Finally, results of laboratory tests may be surrogates of the outcome of interest.For example, low-density lipoprotein cholesterol and glycated hemoglobin (HbA1c) are surrogate measures for effectiveness of high cholesterol treatment and diabetes, respectively (Cox et al, 2009).The availability of these results varies from source to source, being more frequently recorded in electronic medical records.Administrative claims data usually have information about whether a laboratory test was ordered or not and not typically its results, unless both sources are properly linked.More commonly, administrative databases can identify final endpoints such as stroke, myocardial infraction or fractures.
In pharmacoepidemiology research, the primary exposure of interest is often the drug exposure.It must be noted however, that diseases, conditions or procedures may also be exposures of interest.
Drug use information in databases is not susceptible to recall and interviewer bias.
Nevertheless, amongst databases we can find several degrees of accuracy.The most frequently used and the most accurate measurement of drug exposure is outpatient claims prescription/pharmacy records (Cox et al, 2009).When a patient goes to a pharmacy and gets a drug dispensed, the pharmacy bills the insurance carrier for the cost of that drug, and has to identify which medication was dispensed and all drug attributes (milligrams per tablet, number of tablets, etc.).Since this process involves reimbursement, claims are often audited which results in a high level of data quality (Strom, 2005b).This level of quality is also true for pharmacy records.Pharmacy records in the Netherlands can provide high quality data because there is universal computerization of pharmacy records and there have been strong regulatory and reimbursement incentives, over time, for Dutch patients to frequent a single pharmacy (and also a single GP).This enables the gathering of complete drug histories and therefore enhances the longitudinal nature of the medication data (Leufkens & Urquart, 2005).
In the case of prescription/pharmacy records, the uncertainty about whether a patient actually fills a prescription does not exist (unlike GPRD data), however the uncertainty still remains about whether the patient, after filling a prescription, actually takes the drug as prescribed.Studies based on prescription and dispensing records assume that most people who receive a prescription for a drug actually take it (Jick et al, 1991), because uncertainty would presumably be to a lesser degree.
Electronic medical records are another data source that may be used to identify drug exposure; recording whether the physician prescribed medication for the patient, the dose and the intended regimen.It is, however, much less accurate than claims/pharmacy records.In the U.S.A., it should be noted that FDA does not accept electronic medical records as a source for measuring drug exposure with respects to effectiveness assessment (Cox et al, 2009).Even so, prescription data in the GPRD are known to be well documented, since GPs use the computer to generate prescriptions and these are automatically recorded in the database (Herrett et al, 2010).It is also noteworthy that unlike medical records (where information about drug is often derived only from the GP and specialist-prescribed data are often not available), prescription claims from administrative databases (or pharmacy records) are derived from pharmacy billing records, regardless the prescriber (as long as the drug is dispensed by a pharmacy that submits a bill to the administrative system), and thus reflect all drugs dispensed.Just like diseases, drugs are assigned codes in secondary databases.There is a clear need for a standardized classification system for drugs that can be used as a common language for describing and quantifying drug use.National Drug Code (NDC) and Anatomic Therapeutic Chemical Classification System (ATC) developed by World Health Organization are commonly used as drug classification systems.Even though these coding schemes can be used to measure categorical drug exposure (exposed vs non-exposed) it is also important to quantify drug exposure.We can use information available in prescription claims, pharmacy records or potentially in electronic medical records, and the amount of drug over the study period can be calculated for a given patient (total exposure) or transformed into a daily dose (average amount of drug per day).The identification of exposure date is essential and this is especially true when the outcome must be determined to have occurred after the exposure (Harper, 2011).
The ATC/DDD (defined daily dose) system is of paramount importance to drug utilization research in order to improve quality of drug use.The DDD is a stable drug utilization metric that enables comparisons of drug consumption between healthcare systems, regions and countries and therefore makes it possible to examine trends in drug use over time and in different contexts.This is the purpose for which the system was developed and it is with this purpose in mind that all decisions about ATC/DDD classification are made.

Advantages and limitations of automated databases
The process of using databases for research involves careful selection and definition of research questions, using appropriate methods for analysis and interpretation of results, and selection of appropriate data sources.A thorough understanding of the advantages and limitations of data sources and methods is essential.
There are a number of clear advantages to using databases for pharmacoepidemiology research.These include allowing research to be conducted in a real-life setting, which permits the analysis of populations that are often excluded from clinical trials, particularly elderly people and women (Gurwitz et al, 1992;Lee, 2001).Databases can also help to enhance the representativeness of research.Some databases provide population-based data which cover the entire population of a region and are fully representative of the population (Assimes et al, 2009).Moreover, their large sample size (patient number ranging from several hundred thousand to several million) allows the study of drugs that are used relatively infrequently and rare effects in large population (Suissa, 2004).
Their computerized longitudinal data of routine clinical care facilitates drug utilization studies as well as the study of effectiveness and safety (including rare and late events, and also events from chronic exposure) in a real-world (Takahashi et al, 2011).Data are already collected and stored in computerized format in automated databases which means that research can be undertaken at a relatively low cost and the amount of time required to complete a study is reduced.Databases do not require informed patient consent and are therefore, less prone to bias from nonresponse (Suissa & Garbe, 2007).
Another advantage of databases is that they can demonstrate precise drug dispensing patterns since they avoid recall and interviewer bias, as they do not rely on patient recall or

www.intechopen.com
Epidemiology -Current Perspectives on Research and Practice 140 interviewers to obtain their data, a typical concern with primary data collection (Brenda et al, 1994).
Finally, as secondary databases may include a wide variety of data since they can link together various forms of healthcare information (Harpe, 2009) their applications vary broadly, including relative (comparative) effectiveness, cost-effectiveness, drug utilization, active safety surveillance, healthcare costs and resource use, among others (Takahashi et al, 2011).
Automated databases are not, however, without limitations.As mentioned above, administrative databases are typically designed for billing and record-keeping purposes and not for research.Even electronic medical records data are recorded as part of clinical patient care and again not for research purposes.The potential errors that may occur at many points during the record-keeping process were described by Schneeweiss and Avorn (Schneeweiss & Avorn, 2005).
The major weakness of automated databases is related to the quality of data inputted into them; the validity of diagnostic information contained in the database is often uncertain.This is especially true for administrative/claims databases (where diseases are primarily coded for "billing" and not for research purposes), particularly for outpatient data.However, it is less problematical for inpatient diagnosis and for health medical records (Strom, 2005b), as discussed below.
Databases, particularly administrative databases, lack information on some potential confounding variables, namely data on smoking, alcohol consumption, date of menopause/menarche and reproductive history in women, physical activity, occupation, etc.In addition, information about disease severity is frequently lacking and therefore it may not be possible to exclude confounding by disease severity (Strom et al, 1991).All these variables can be of great importance to investigate specific research questions (Park & Stergachis, 2008).
Obtaining medical records to evaluate the validity of diagnosis data (especially when the outcome is poorly defined by the diagnostic coding system) can be essential.However, it is not always easy to get access to primary medical records to collect this data, or other data related to patient's clinical care, due to privacy and legal issues (Mackenzie et al, 2011), or simply because it is not available.Even when it is available, this is a time-consuming and an expensive process (Rawson & Shattin, 2008).
Databases often do not include data on medications obtained without a prescription (e.g.Over-the-Counter (OTC)) which may be responsible for drawbacks in the assessment of associations between non-prescription medicines and diseases (Ilkhanoff et al, 2005) or lacking as a confounder variable.Similarly, medication obtained outside of a particular insurance carrier's prescription plan, for example in U.S.A., represents also a limitation (Strom, 2005b).
Finally, high membership turnover at Health Maintenance Organization can lead to population instability over time (Schneeweiss & Avorn, 2005).This means that data might not be representative of the population being studied (some databases cover the entire population, while others, for example cover the elderly or low-income or those who have higher

www.intechopen.com
Overview of Pharmacoepidemiological Databases in the Assessment of Medicines Under Real-Life Conditions 141 educational achievements).These are limitations that should be taken into consideration particularly when using administrative databases for pharmacoepidemiological research.

Evaluation of the quality of automated databases for pharmacoepidemiology research: An emphasis on validity and completeness
As research becomes increasingly focused on real-world data, including automated databases, it is imperative to know how to evaluate the quality of these data sources.
In order to assess the association between drug use and outcome (adverse or beneficial effects) throughout databases, two fundamental concerns should guide the evaluation of data sources: validity and completeness.Deep awareness of these issues is crucial, as emphasized by van Staa's critique of a recent validation study of a database (van Staa & Parkinson, 2008).
Errors in estimation are conventionally classified as either random or systematic.Both types of errors can occur in measuring exposure and outcome.Systematic errors are commonly referred as bias; the opposite of bias is validity, hence a valid estimate has little systematic error.In the same way, an estimate with little random error is considered precise, thus the opposite of random error is precision.Validity and precision are both components of accuracy.Furthermore, validity is usually broadly divided in two components: internal validity (the validity of the inferences drawn as they pertain to the member of study population) and external validity or generalizability (the validity of inferences as they pertain to people outside the population) (Rothman et al, 2008).These key definitions should always be considered when assessing the quality of databases.
Misclassification of drug exposure and outcomes can occur and lead to bias (misclassification bias) with the magnitude and direction of bias depending on the mechanism of the misclassification (Csizmadi & Collet, 2005).The validity of drug exposure and outcomes, as well as covariates, can be measured by their sensitivity and specificity.
Sensitivity of exposure or outcome information is defined as the proportion of all drug exposures or health outcomes -that are truly exposed or truly have the health outcome -in the covered population that appear in the database and therefore are classified as exposed or are positive for the health outcome.The specificity of a database with respect to drug exposure or to health outcome refers to its capability in ensuring that patients classified as unexposed or negative for the outcome are done so correctly (Rothman et al, 2008;Park & Stergachis, 2008).
A way to measure the validity of outcomes and exposures using databases is to compare it with a gold standard.In terms of outcome measures in administrative databases, the gold standard is often the medical record or patient self-report, these contain a high level of specificity but a great deal of variability in sensitivity across diagnoses (Strom, 2001;Wilchesky et al, 2004).However, Schneeweiss & Avorn have argued that a lack of specificity in the outcome measurement is worse than a lack of sensitivity in most situations and highlighted the fact that if specificity of the outcome assessment is 100% then relative risk estimates are therefore unbiased (Schneeweiss & Avorn, 2005).This brings up the idea that, for certain conditions, outpatient diagnosis derived from administrative databases are not of such poor quality.One the other hand, generally speaking, it is important to note that the validity of diagnosis data amongst medical records data is better than in claims databases, as these data are being used for medical care and not just for billing purposes.
Missing subjects, exposures or events could also introduce bias in the study results.As an illustration, a bias may be introduced in the association between a serious adverse drug reaction if hospitalizations due to that adverse reaction are missing from the database (Hartzema et al, 2011).Another example is connected to potentially stigmatizing diagnoses which often may be more likely written in free text data, to avoid occurrence in summary records (false negative).Evans and colleagues have estimated that only 50% of HIV patients have their diagnosis coded in their primary care records (Evans et al, 2009).
For reason previously mentioned in this chapter, prescription claims/pharmacy records provide some of the best data in pharmacoepidemiology.A Dutch study investigated the validity of medicine exposure measurement based on pharmacy records compared with home inventory and concluded that pharmacy records can be a reliable source of the true medicine exposure when adequate attention is given to the definition time window and when these records are comprehensive with regard to prescription drugs (Lau et al, 1997).However, as mentioned before, since there is no information about compliance with therapy, uncertainty arises about true drug consumption.
Another issue that should be considered when interpreting results derived from studies conducted through databases is the importance of evaluating testing bias.Laboratory testing in clinical practice is never a random process (physicians selectively request tests for patients with a high probability of abnormalities and vice versa) which means results obtained from databases may be biased due to testing bias and, therefore, in certain circumstances need careful interpretation.A study conducted by Velthove and colleagues demonstrated that requests for neutrophil counts in the Utrecht Patient Oriented Database were associated with the underlying disease and particularly with cardiovascular disease, emphasizing the importance of evaluating this type of bias in databases studies' context (Velthove et al, 2010).Moreover, when analyzing trends over time, we need to be aware of certain issues regarding laboratory test result recording, particularly the availability of laboratory tests' recording codes and its results in the database (Dial et al, 2005).
The generalizability (external validity) of a database can be defined as the degree to which the population covered by the database is representative of the total population.At a glance, generalizability of a database can be assessed by: 1) statistical comparison of sociodemographic information (age, gender, education level, occupation, among others) between the population covered by the database and the external target population or 2) evaluation of eligibility criteria for enrollment defined by the organizations' owners.Little hard evidence is available about generalizability, however, concern has been raised about the applicability of data from Health Maintenance Organization plans in U.S.A. to the broader population (Park & Stergachis, 2008;(Strom, 2005b).As a result, careful interpretation of different results derived from different databases should be given, since like for like comparisons may not be possible considering the different populations with different attributes.
Taking into consideration the diversity of characteristics in databases, Park and Stergachis have defined completeness of data coverage as the extent to which all filled prescriptions, all coded diagnoses for outpatient visits and hospitalizations, exposure to non-prescription medications, and potential confounding patient factors appear as variables in the database (Park & Stergachis, 2008).In other words, completeness can be defined as the proportion of all cases (exposures and outcomes) that occurred in the population covered by the database that is recorded.Finally, it should be noted that given the importance of this matter, the International Society of Pharmacoepidemiology has developed guidelines for conduct in database research in pharmacoepidemiology in order to assist researchers in the selection and use of databases, highlighting limitations and quality and validation procedures (Hall et al, 2011).Nevertheless, given its utility in the field and the number of promising databases developments that are in the horizon, ongoing updates should be put in place (Rothman & Poole, 2007).

Use of automated databases: Pitfalls and methodological challenges
The nonrandomized world of research on drug effects is complex, certainly more complex than the world of randomized trials (Schneeweiss, 2009).Methods and strategies employed to evaluate intended and unintended drug effects in a real-life context require rigorous epidemiology.Contrary to any preconceptions, the use of databases for pharmacoepidemiology research is challenging -"It is not a simple process of get some data and do some statistics" (Harpe, 2009).Challenges of conducting studies throughout databases include concerns about study design, careful understanding of the underlying health care system in which the data were generated, data quality, limited ability to control confounding in the absence of randomization and to handle bias, and data analysis, among others (Berger et al, 2009).
For over two decades ago, the growing enthusiasm for automated databases went together with harsh criticism.In 1989, Samuel Shapiro criticized several studies that had been published in the 1980s without adequate appraisal of data, study design and epidemiologic methods which had, in turn, led to spurious conclusions (Shapiro, 1989a).At that time, Shapiro claimed that the basic concepts of pharmacoepidemiology should not be abandoned as a result of using the so-called "modern" automated databases and argued that the basics (e.g.use of proper definitions of outcome an exposure, addressing bias and confounding, etc.) should be kept in mind when using those sources of information.This publication raised a lot of controversy and an extensive methodological soap opera about this issue was set in place (Strom, 1989;Faich & Stadel, 1989;Strom & Carson, 1989;Shapiro, 1989b;Tilson et al, 1989;Jick & Walker, 1989).
Advances in both epidemiology and biostatistics, over the past 20 years, including sophisticated software for analysis, have allowed novel methods for addressing confounding and bias to develop and despite the known limitations of databases, poor quality in conducting and reporting pharmacoepidemiologic studies still exits and there is definitely room for improvement.There are several examples of poorly conducted database studies that were published, even in the top journals with the highest impact factors (Suissa, 2007).
In the first decade of this century, observational studies showed surprising results associated with the use of statins.Statins were a kind of nonspecific miracle drug, reducing: fracture rate by 50% (Meier et al, 2000), rate of dementia by 71% (Jick et al, 2000), rate of depression by 60% and rate of suicidal behavior by 50% (Yang et al, 2003), all causes of mortality in patients with chronic obstructive pulmonary disease (COPD) (Søyseth et al, 2007), amongst others miracles -statin treatment was good for everything!A lot of interesting theories were developed to explain these associations, leaving often basic clinical pharmacology and epidemiologic principles behind.In addition to the classic example of statins, several other studies conducted through health-care databases have reported impressive results for commonly used drugs, for example, in reducing all-cause mortality (Hippisley- Cox & Coupland, 2005).The problem has appeared when discordance in the results of pharmacoepidemiology studies were revealed, either in studies that use different sources of information (different databases) or even in those that are conducted in the same database!For example, various studies have used the UK General Practice Research Database (GPRD) to evaluate the same side effects of drugs, often arriving at opposite conclusions.Associations between statins and fracture (de Vries et al, 2006); and oral bisphosphonates and cancer (de Vries F, 2010), are just a few examples, but there are several other, whether conducted with different or with the same database.
On which study is drug-therapy decision making based?(Etminan et al, 2006).How can investigator choices that may change results (investigator bias) be dealt with?The existence of guidelines and forms by themselves probably will not change this picture.Although sometimes neglected, understanding methodological issues and training basic epidemiology are essential.
Pharmacoepidemiology database research is challenging for researchers, peer-reviewers, editors and readers of medical journals (Suissa, 2007).Some pitfalls and methodological challenges are illustrated below.

Immortal time bias
Immortal time in epidemiology refers to a period of follow-up during which, by design, death or the outcome of the study cannot occur (Rothman & Greenland, 2008).This bias was first identified in the 70s in the context of heart transplantation research (survival benefit of heart transplantation) and recently reappeared in pharmacoepidemiology, particularly in studies conducted through databases, reporting that several drugs can be extremely effective at reducing morbidity and mortality (Suissa, 2008).
Several poorly analyzed studies employed a time-fixed definition of exposure to emulate an intention-to-treat analysis used by clinical trials.This principle assumes that subjects are exposed to the drug under study immediately at the start of follow-up, which cannot occur in the real-world and, in fact, it is unknown in databases studies.
Immortal time can arise when the period between cohort entry and date of first exposure, e.g., to a drug, during which death/outcome has not occurred, is either classified or excluded and therefore not accounted for in the analysis.As a result, immortal time bias is particularly problematic because it necessarily biases the results in favour of treatment under study by conferring a spurious survival advantage to the study group (Lévesque et al, 2010).The appropriate approach to data analysis requires that all immortal time be accounted for fully, including that before the start of exposure, and therefore it will be correctly classified in terms of exposure.The extent of the bias will depend directly on the amount of total person-time misclassified or excluded.Consequently, the longer the exposure window, the larger the bias (Suissa, 2007).
Although these biases have been typically approached in cohort studies conducted using databases, they can occur also in case-control studies, since most of case-control studies in pharmacoepidemiology are conducted using databases as well, and subsequently it is essential to ensure an equal time-window to measure exposure for cases and for controls (Suissa et al, 2011).In these studies, if time is not properly considered in selection of controls, an artificial appearance of effectiveness for the drug will be generated.
Recently, Samy Suissa has illustrated time-related biases in 20 published studies -most of them in several respected journals and claimed for a re-assessment of all those studies for immortal time bias (Suissa, 2007).

• Confounding (by indication bias)
Confounding is a challenging threat to validity in nonrandomized studies of drug effects.Automated databases, especially administrative databases, have been criticized for the incompleteness of their information on potential confounders (OTC life-style habits, smoking status, body mass index, markers of clinical disease severity, among others).
Moreover, such factors may lead to selective prescribing of drugs, which may, in turn, result in biased estimates of the association between drugs and outcomes -confounding by indication (Walker, 1996).Thus, the prescription of a drug is based on diagnostic and prognostic information available at the time of prescribing and also other factors such as behavioral characteristics from both physician and patient.Generally speaking, a drug is more likely to be prescribed to a patient with more severe diseases who, in turn, is more likely to experience an adverse outcome of the disease.The problem of confounding by indication can be illustrated in a way that is very similar to selection bias.The basic idea is that subjects who receive a certain drug are intrinsically different from those patients not receiving the drug.
Strategies to adjust such confounding vary depending on whether the potential confounders are measured in a certain database.If confounders are measured in a certain database, then the usually strategies for controlling confounding can be applied: restriction, stratification, matching, and multivariable modeling (Schneeweiss & Avorn, 2005).
On the other hand, confounding by indication is extremely difficult to control, even when the reason for prescribing is straightforward, mainly because the precise reason to prescribe is rarely measured.That is because "indication" is a complex and a multifactorial phenomenon, as described above (Csizmadi & Collet, 2005).Thus, control for confounding for indication, if possible, should be tackled at the design level (e.g. by restricting the study to a group of patients homogeneous with respect to disease severity or by comparing two drugs only for the same indication).
Studies with potential confounding by indication can benefit from other appropriate analytic methods, explained below, including separating the effects of a drug taken at different times, sensitivity analysis for unmeasured confounders and instrumental variables.
Sensitivity analysis is defined as a quantitative analysis of the potential for systematic error (Csizmadi & Collet, 2005).Basic sensitivity analyses of residual confounding (unmeasured confounders) try to determine how strong and how imbalanced a confounder would have to be among drug categories to explained the observed effect (Schneeweiss & Avorn, 2005).Schneeweiss provided a systematic approach to sensitivity analyses to investigate the impact of residual confounding in studies conducted through databases and argued for a more frequent application of sensitivity analyses and external adjustments, substituting qualitative discussions of residual confounding (Schneeweiss, 2006).
Propensity scores and instrumental variables can be used to approximate the random allocation process.Propensity score is defined as the conditional probability of being t r e a t e d , g i v e n a n i n d i v i d u a l ' s c o v a r i a t e s , t h e o b j e c t i v e o f p r o p e n s i t y s c o r e i s t o simulate RCT treatment groups in order to estimate a causal treatment effect (Csizmadi & Collet, 2005).The purpose of propensity scores is to create groups that are similar with respect to all measured characteristics except treatment status (Wang & Donnan, 2001).The propensity score analyses, however, cannot address the issue of bias when there are important variables not included in the propensity score estimation (Johnson et al, 2009).If several conditions are fulfilled, instrumental variables (IV) have the potential to estimate unbiased estimates in databases studies (Brookhart, 2010).IV have the potential to adjust for all confounders, whether observed or not.The idea is that the causal effect of exposure on outcome can be captured by using the relationship between the exposure and another variable, the IV (Martens et al, 2006).
The use of all these methods is complex and requires extensive training, careful implementation, and appropriate balanced interpretation of findings (Murray, 2010;Johnson et al, 2009).

Conclusion
Bradford-Hill said that "all scientific work is incomplete -whether it be observational or experimental.All scientific work is liable to be upset or modified by advancing knowledge.That does not confer upon us a freedom to ignore the knowledge we already have, or to postpone the action that it appears to demand at a given time".This is especially true in the context of medicines' assessment.Society demands an answer to bridge the gap between clinical trials and realworld.
The increasing availability of large automated healthcare databases represents a unique opportunity to study the landscape of drug use patterns and both beneficial and adverse drug effects in routine clinical practice.But, research on the assessment of medicines under real-life conditions is methodologically complex and can be challenging.It will not infrequently result in biased drug effects estimates if epidemiological principles are not followed.
In recent decades, we have already learned much more about healthcare databases appropriate role in pharmacoepidemiology research.A clear and comprehensive understanding of the strengths and weaknesses of pharmacoepidemiological methods, of how the data were collected, enrollment and coverage factors, approaches to minimize confounding in the absence of randomization, the specificity of clinical outcome assessment amongst other factors that might have affected data quality and validity is essential.

Overview of Pharmacoepidemiological Databases in the Assessment of Medicines Under Real-Life Conditions 147
The trend of utilization of healthcare databases for pharmacoepidemiology will continue to increase in coming years.But this must be unquestionably accompanied with high capacity building in this field.
There have been lessons learned, but there are challenges ahead.Knowing what those numbers mean for practice and communicating their meaning effectively will be ultimately one of the biggest real life challenges.