Biomarkers in Drug Development: A Useful Tool but Discrepant Results May Have a Major Impact

The high costs incurred when drugs fail during clinical trials has prompted interest in biomarkers as biological indicators for progress of disease, effect of therapeutic interventions, and/or drug-induced toxicity. One of the goals is to reduce attrition of drugs during the clinical, and probably preclinical, phases of drug development, and hence, the overall cost of drug development. The role of biomarkers has been exponentially increasing in guiding decisions in every phase of drug development, from drug discovery and preclinical evaluations through each phase of clinical trials and into post-marketing studies. In early phases of drug development, biomarkers are used to evaluate activity in animal models, prove mechanism of action and concept of an investigational entity, bridge pre-clinical and clinical pharmacology, and evaluate safety in animal models and humans. In late stages of drug development, biomarkers can be used to make decisions in the evaluation of dose-response and optimal regimen for desired pharmacologic effect and safety, and some biomarkers can be used as a surrogate endpoint for efficacy and/or toxicity. Also, biomarkers can predict patients’ response to compound-enabling patient enrichment strategies by identifying certain patient populations that are more likely to respond to the drug therapy or to avoid specific adverse events. This shift toward “personalized medicine,” in which the patient receives a treatment based on his/her genetic makeup as well as medical profile, is helping the drug industry achieve the goal of quick and cost-effective research, especially in poorly served areas such as neurodegenerative disorders and cancer. Biomarker assays range from exploratory type of assays performed on a fit-for-purpose basis to rigorously validated assays when a biomarker is used as a surrogate end point, for patient selection, or for randomization into different arms. Validation of biomarker assays should be considered a continuous and evolving process. It is imperative that biomarker development be accelerated along with therapeutics. Assay validation is essential, but of equal or even greater importance is the monitoring of assay performance and level of quality during production. Despite all of the potential benefits of using biomarkers to advance pharmaceutical research and development, discrepant results can pose a threat to development programs by triggering false decisions.


Introduction
The high costs incurred when drugs fail during clinical trials has prompted interest in biomarkers as biological indicators for progress of disease, effect of therapeutic interventions, and/or drug-induced toxicity. One of the goals is to reduce attrition of drugs during the clinical, and probably preclinical, phases of drug development, and hence, the overall cost of drug development. The role of biomarkers has been exponentially increasing in guiding decisions in every phase of drug development, from drug discovery and preclinical evaluations through each phase of clinical trials and into post-mark e t i n g s t u d i e s . I n e a r l y p h a s e s o f d r u g development, biomarkers are used to evaluate activity in animal models, prove mechanism of action and concept of an investigational entity, bridge pre-clinical and clinical pharmacology, and evaluate safety in animal models and humans. In late stages of drug development, biomarkers can be used to make decisions in the evaluation of dose-response and optimal regimen for desired pharmacologic effect and safety, and some biomarkers can be used as a surrogate endpoint for efficacy and/or toxicity. Also, biomarkers can predict patients' response to compound-enabling patient enrichment strategies by identifying certain patient populations that are more likely to respond to the drug therapy or to avoid specific adverse events. This shift toward "personalized medicine," in which the patient receives a treatment based on his/her genetic makeup as well as medical profile, is helping the drug industry achieve the goal of quick and cost-effective research, especially in poorly served areas such as neurodegenerative disorders and cancer. Biomarker assays range from exploratory type of assays performed on a fit-for-purpose basis to rigorously validated assays when a biomarker is used as a surrogate end point, for patient selection, or for randomization into different arms. Validation of biomarker assays should be considered a continuous and evolving process. It is imperative that biomarker development be accelerated along with therapeutics. Assay validation is essential, but of equal or even greater importance is the monitoring of assay performance and level of quality during production. Despite all of the potential benefits of using biomarkers to advance pharmaceutical research and development, discrepant results can pose a threat to development programs by triggering false decisions.

Some historical background
Medical practice in ancient times was performed mainly by physical examination and observation of the patient. However, testing of biological fluids for diagnostic and predictive purposes started around 6000 years ago with the analysis of human urine (Armstrong 2007). Prior to Hippocrates (460 -370 BC), Babylonian, Egyptian and Far Eastern cultures were familiar with the diagnostic utility of urine. Urine assessments by Sumerian and Babylonian physicians were documented in as far back as 4000 BC, when they first discovered that something other than physical evidence of disease could be utilized to make a clinical decision. In those days, whenever a patient was diagnosed with a serious disease, they would ask him/her to breathe into a sheep's nose. The animal would then be slaughtered and the liver removed and carefully inspected for evidence of disease. The resulting observation was to be used to predict the outcome of the patient's case and its treatment. The Babylonians based this diagnostic art on their theory that the liver was the center of the human body's organs and that the whole of human physiology occurred there, which aligns with our modern perception of the metabolic importance of hepatic cells. One of the earliest recorded diagnostic tests for hormones in body fluids was documented in the time of Ikhnaton and Cleopatra, when Egyptian pharaohs tested for pregnancy by adding a patient's urine to a bag containing wheat and barley seeds. If the seeds germinated the woman was pregnant. If the barley seeds germinated first, it was an indication that the unborn infant was male, but if the wheat seeds germinated first then it indicated that the woman was carrying a female fetus. Testing of this pregnancy theory in 1963 showed 70% predictive value. Over the centuries, pregnancy testing became more sophisticated. In the early twentieth century scientists in several laboratories across Europe independently described the www.intechopen.com presence of a substance that promotes ovary development and growth in rabbits and mice, and they recognized that the substance was a specific hormone, now known as human chorionic gonadotropin (hCG). In 1928 German scientists Aschheim and Zondek developed the first bioassay for hCG in urine by injecting a woman's urine into an immature rat and looking for an estrous reaction; hyperemia of the ovaries and growth of the follicles. Another ancient diagnostic test was documented in Hindu cultures, utilizing the sweetness of urine and its ability to attract black ants to diagnose diabetes mellitus. (Winsten, 1969;Haber, 1988;Leavitt, 2006;Armstrong, 2007;Eknoyan, 2007;and NIH, 2011). Urine was once, and still is to a degree, regarded as a powerful fluid in many cultures. Towards the end of the 18th century, doctors with an interest in chemistry turned their attention to the scientific basis of urine analysis and to its use in practical medicine. To serve this interest, Boehringer Manheim launched the first urine dipstick in the mid-20 th century. Over the last four decades, the importance of biomarkers in clinical trials and patient management has increased exponentially. The following graph ( Figure 1)

Brief biomarker laboratory regulatory aspects
CLIA (Clinical Laboratory Improvement Act) certification and CAP (College of American Pathologists) accreditation have become a customary element of laboratory capability presentations. As a consequence of media coverage and public concern regarding falsenegative Pap smears in detecting cervical cancer, congressional hearings were held in 1976 and again in 1988 on medical laboratory practices. Congress passed the Clinical Laboratory Improvement Amendments of 1988 (CLIA 1988) to ensure accuracy and reliability of laboratory testing. This legislation, for the first time, extended federal regulation to all laboratories (hospital, independent, and physician office laboratories, etc.) that perform microbiological, serological, chemical, hematological, immunohematological, immunological, toxicological, cytogenetical, exfoliative cytological, histological, pathological or other examinations performed on materials derived from the human body, for the purpose of diagnosis, prevention of disease, and treatment of patients. The Centers for Medicare & Medicaid Services (CMS) have primary responsibility under CLIA for regulating approximately 195,000 labs that are certified under CLIA. Since 1988, the CMS, along with the Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention (CDC), has been working to improve the quality of laboratory testing through a variety of research, educational and enforcement activities. Of these quality enhancement measures, certified labs performing moderate and high complexity tests have been required, since 1994, to participate successfully in approved proficiency testing (PT) programs, which provide an external evaluation of the accuracy of each laboratory's test results. These surveys may be conducted by the CLIA program, a State survey agency under contract with CMS, or private CMS-approved agencies such as Commission on Office Laboratory Accreditation (COLA), the Joint Commission on Accreditation of Healthcare Organizations (JCAHO), or CAP. CAP is the largest PT provider in the world, and is also the largest laboratory-accrediting agency in the United States. Through the PT program, the CAP provides individual laboratories with multiple specimens, usually three to five, for testing on three different occasions per year. The participants analyze the specimens and return results to the CAP for evaluation. Consequently, each participating laboratory receives a report of their performance along with a report summarizing the results of all participating laboratories in a peer group format. The CAP believes that by comparison to the most relevant instrument/reagent combinations, a laboratory's performance is accurately assessed (www.CAP.org). The CAP automatically forwards results for analytes regulated for proficiency testing to the CMS for laboratories that are accredited by the US government under the CLIA. Of the tens of thousands of biomarker assays applied for different purposes in clinical trials, only 88 analytes (biomarkers) are regulated by CLIA--9 hematology, 17 general immunology, 1 diagnostic immunology, 5 immunohematology, 25 routine clinical chemistry, 7 endocrinology, 15 toxicology and therapeutic drug monitoring, and 9 microbiology. For all other 'non-regulated' analytes, CLIA dictates that the laboratory must have a quality assurance plan that establishes the accuracy and reliability of the testing at least twice per year. Proficiency testing programs offer a wide array of products to conveniently assist laboratories in fulfilling this requirement (www.CAP.org). The CAP offers proficiency testing for more than 1,000 analytes. (CLIA, 1988;CMS, 2006;Paxton, 2007;Howerton et al, 2010;Benneyan, 2011;and CAP, 2011).

Utility of biomarkers in clinical trials
High attrition rates are a critical issue in drug development, especially within oncology (Walker and Newell, 2009 be 10,000:1. From an average of 10,000 new chemical entities presented to pharmacology and safety testing, only about 10% (1000) would pass the criteria of activity and lack of toxic side effects. Of the 1,000 entities approved for clinical study, only 1% would show the combined safety and efficacy required by the clinic. Last but not least, out of the 10 NDA submissions to the FDA, only 10% of these (1 drug), on average, passes the review process (Network Science, 2011). This high attrition rate adds to the expensive and lengthy process of developing new medication, resulting in stagnation in the development of new compounds (Bowalekar, 2011). One of the options for the pharmaceutical industry to improve the high attrition rate during drug development is to move away from treat-and-see testing of new drugs in patients and focus on generating translational biomarkers early in the research process to enable more predictive evaluation of drug action in clinical trials (Gool et al, 2010). As a form of encouraging guidance, the FDA released a critical path initiative document in 2004, emphasizing the need for developing innovative trial designs. One of the innovations suggested was to use biomarkers to evaluate safety and effectiveness, predict effectiveness, provide informative links between mechanism of action and clinical effectiveness, connect animal and human studies, and serve as surrogate endpoints. New imaging technologies and emerging techniques of pharmacogenomics and proteomics show great promise in this respect, but much developmental work and standardization of the biological, statistical, and bioinformatics methods must occur before these techniques can be easily and widely applied (FDA, 2004).

Classes of biomarkers in clinical trials
Clinical laboratory measurements are an essential component of most drug studies to demonstrate safety and efficacy.

Safety biomarkers
Application of the most sensitive procedures to identify toxicity as early as possible in clinical development before engagement into expensive phase III trials is essential. Thus, at phases 1 and 2, careful selection of the correct tests should be mandatory, and the selection of those tests should be based on the compound profile and pre-clinical toxicology data (Craig, 2004). In addition to physical examination, vital signs, and electrocardiogram (ECG), constantly monitored safety lab biomarkers can act as common vital organ function tests applied across different therapeutic areas or as specialized testing applied to detect unique toxicities. Safety testing can be classified as follows:

Liver safety tests
The liver's unique position between the gastrointestinal tract and the rest of the body, in addition to its vast capability to perform diverse functions essential for life, dictate its enormous role in maintaining metabolic homoeostasis of the body and turn it into the first resort for drugs and other toxicants. The most common tests applied in clinical trials are serum aspartate transaminase (AST), alanine transaminase (ALT), alkaline phosphatase (ALP), gamma-glutamyl transferase (GGT), and bilirubin. Changes observed in different liver tests depend on the significance of the liver's involvement. ALT is located mainly in the cell cytosol but AST is located mainly in the mitochondria, which makes ALT quicker in release after acute hepatocellular injury. Also, ALT elevation lasts longer than AST's, and it is a more liver-specific enzyme for which serum elevation is rarely observed in conditions other than parenchymal liver damage. ALP and GGT are membrane-bound enzymes. ALP significantly increases in conditions that cause biliary obstruction but only moderately elevates in parenchymal cell damage. Although renal tissue has the highest content of GGT, the primary source of serum GGT is believed to be of hepatobilairy origin. Release of GGT into serum is caused by the toxic effect of alcohol or some drugs on the microsomal structures of hepatic cells. Bilirubin is a marker of the excretory liver functions, and both conjugated and unconjugated bilirubin increase in obstructive liver damage (Craig, 2004 andBalistreri andRej, 1994). Albumin and prothrombin time can be used to assess changes in the synthetic functions of the liver but significant changes may only happen with chronic hepatocellular damage. According to the FDA's guide, drug-induced liver injury (DILI) is predominantly hepatocellular damage that can be revealed by the rise in serum ALT or AST. The ability to cause some hepatocellular injury, however, is not a reliable predictor of a drug's potential for severe DILI. Many drugs that cause transient rises in serum amino transaminases (AT) activity do not cause progressive or severe DILI, even if drug administration is continued. Severe DILI are induced by those drugs that can cause hepatocellular injury extensive enough to reduce the liver's ability to clear bilirubin from the plasma or to synthesize prothrombin and other coagulation factors (FDA, 2009).

Hematology safety biomarkers
Bone marrow is a primary target for toxicity caused by many classes of drugs including cytotoxic compounds, and the effect can be reflected by changes in peripheral blood components. Complete blood count, one of the fundamental safety indices in drug development, includes total hemoglobin, hematocrit, red cell count, mean red cell volume, mean cell hemoglobin, red cell distribution width%, mean cell hemoglobin concentration, total white cell count, differential white cell count (Neutrophils, lymphocytes, basophils, esinophils, and monocytes), and platelets (Craig, 2004).

Bone safety biomarkers
Bone is a living connective tissue constantly undergoing a process of remodeling, which includes a degradation stage of bone resorption by the action of osteoclasts and a building www.intechopen.com Biomarkers in Drug Development: A Useful Tool but Discrepant Results May Have a Major Impact 407 stage of formation mediated by the action of osteoblasts. Serum calcium and inorganic phosphates have been traditionally used as bone biomarkers in clinical trials.

Basic metabolic dafety biomarkers
Blood glucose, triglycerides (TG), total cholesterol, low density lipoprotein cholesterol (LDLc), and high density lipoprotein cholesterol (HDL-c).are commonly used within the safety biomarker panel but can be used as efficacy biomarkers too.

Efficacy biomarkers
The purpose of efficacy testing differs fundamentally from safety monitoring in that biomarkers are being used to demonstrate a change in all, or at least a good proportion of treated subjects; in other words, the more positive the biomarker, the higher the efficacy of a drug. Efficacy biomarkers can be classified into the following groups: surrogate, predictive, pharmacodynamic (PD), and prognostic biomarkers. Figure (2) illustrates different classes of biomarkers; drug metabolizing enzyme, drug receptor, and intermediary pathway substrate polymorphisms as predictive of a drug response, an intermediary signal produced from the interaction of a drug with its receptor as a PD biomarker, and a surrogate biomarker to demonstrate the final drug action. The diagram shows that panels 1 and 4 have similar pharmacological pathway components, in terms of quality and quantity, but the magnitude of the endpoints' action can be significantly affected by the rate of converting the inactive drug to an active one). Panels 2 and 3, compared to Panel 1, show that two subjects may have the same efficiency of drug metabolizing enzymes but, due to mutations in the drug receptor or downstream intermediary protein substrate, the drug does not perform its intended final action.

Surrogate biomarkers
A surrogate endpoint is a laboratory or physical measurement used in clinical trials to indicate a drug's response and can be used in place of a clinical endpoint (Lonn, 2001). A clinical endpoint is a characteristic or variable that reflects a patient's health status, usually related to efficacy, and is usually acceptable as evidence of efficacy for regulatory purposes. A surrogate biomarker can be used to assess the benefit of or harm from a therapeutic agent, based on epidemiologic, Table (1) summarizes the basic biomaker safety panel recommended for each trial, which looks very similar to the basic standard-of-care lab profile:therapeutic, pathophysiologic, or other scientific evidence that links the biomarker to the clinical outcome (Woodcock, 2011). Surrogate biomarkers are hugely beneficial when substituted for clinically significant endpoints, also known as patient-oriented outcomes, which can be very time consuming and expensive to prove, for example, blood pressure (BP) for stroke or myocardial infarction. Other examples of surrogate biomarkers are cholesterol, LDL-c, triglycerides, blood glucose, glycosylated hemoglobin (HbA1c), arterial plaque thickness, CD4 count or viral load for HIV response, HCV RNA viral load for HCV response, bacterial count, tumor size, and bone mineral density (Temple, 2009, Woodcock, 2011. Even if evidence for surrogacy is not enough, such types of biomarkers are useful in proving the concept for which a candidate drug is to be used, such as the inhibition of platelet aggregation by an antithrombotic agent (Temple, 2009

Predictive biomarkers
Predictive biomarkers can stratify patient populations into responders and non-responders, predict whether or not a drug will have the intended effect, or forecast the extent to which a drug can be effective and/or toxic in different patient populations. The discovery of Cytochrome P450-2D6 (CYP2D6) polymorphism in 1977 (Mahgoub et al, 1977 andTucker et al, 1977) opened the door for research on the impact of such metabolizing enzyme's genetic variability on the efficacy and toxicity of drugs. However, 34 years after this discovery, only 76 genetic and genomic biomarkers, mainly CYP2D6 followed by CYP2C19, are on FDA labels of 70 approved drugs, mainly for oncology and psychiatry followed by antiviral and cardiovascular drugs (Figure 3). Drug label information on genomic biomarkers can describe drug exposure and clinical response variability, risk for adverse events, genotypespecific dosing, mechanisms of drug action, polymorphic drug target and disposition genes, and precautions-interactions, contraindications, patient counseling, nutritional management (FDA, 2011b).

Predictive biomarkers in personalized medicine
Completion of the human genome project about a decade ago enormously facilitated our understanding of human genetics and the associated biology, and it has become www.intechopen.com Biomarkers in Drug Development: A Useful Tool but Discrepant Results May Have a Major Impact 409 increasingly clear that patients with different genetic makeup manifest diseases differently and respond to medication differently -in terms of both efficacy and safety. Also, there is a rapidly spreading notion that uncertainty about which patients might respond positively or negatively to a particular treatment regimen has significant consequences on patient health and attrition rate in drug discovery, that empirical drug development is unsustainable, and that biomarkers can provide guidance and help with these issues. In this respect, the personalization of medicine, via targeting the right population, offers the potential for mitigating the problem of universalizing therapy into a single, all-encompassing solution. If two populations with genetic and biological makeup similar to Panels 1 and 2 depicted above in Figure (2) use the same drug, Panel 1's population would observe the desired effect while the population in Panel 2 would only be exposed to the side effects of the drug. The population depicted by Panel 4 will need to double the dose used for Panel 1 to get same value. Figure (4) illustrates the concept of predictive biomarkers and personalized medicine. In graph (A), the use of biomarker had no impact, while graph (B)'s biomarker-positive population responded significantly better to a target drug, as measured by survival rate, than the biomarker-negative population when treated with the same drug or the control arm (marker-positive or -negative) receiving the standard therapy. Recent advances in cancer research have focused on drug candidates with specific molecular targets including mutated genes in cancer cells. To achieve the greatest benefit from such types of therapeutic agents, populations that are positive for the target should be identified and exclusively treated, and, in order to do that, an in-vitro diagnostic test (IVD) should be readily available. This IVD can be an existing test for a biomarker that is classified by the FDA as "known valid;" in other words, the biomarker is accepted by the scientific community at-large as a predictor of clinical outcomes, such as LDL-c, HbA1c, and CYP2C19. When a biomarker appears to have predictive value but is not yet replicated or widely accepted, it is classified by the FDA as "probable valid," as in the cases of EGFR and KRAS mutations. These types of biomarkers can be used in targeted therapies to demonstrate the efficacy or toxicity of an agent during a drug's clinical development, and then become "known valid" when treatment is approved (Frueh, 2006). This approach mandates co-development of an IVD with a drug-a companion diagnostic (CDx). Co-development can occur during any stage of drug development but, ideally, a biomarker should be integrated early in the drug's development program so that trial data will support both drug and test approval. Clinical qualification of a biomarker should be prospective, but the retrospective path remains a possibility. Under any circumstances, the biomarker assay should be analytically validated before testing clinical samples. As shown by the following table (Table 2), only a few oncology drugs and IVD have been approved thus far (Datamonitor, 2011). Despite of the biological, analytical, clinical, regulatory, and project management hurdles, co-development of drugs and IVD appears to be the future in facilitating the personalized medicine approach. Figure (5) depicts the ideal path for drug-IVD co-development. After the end of phase II and prior to initiation of pivotal phase III trial, in which the predictive biomarker will be used for patient randomization, both CDER (the Center for Drug Evaluation and Research; the branch of FDA responsible for drug approval) and CDRH (the Center for Devices and Radiological Health; the branch of FDA responsible for approval of medical devices), should approve the approach of codevelopment. Figure (6) illustrates the process of qualifying a predictive biomarker in pivotal phase III.

Pharmacodynamic (PD) biomarkers
These are the biomarkers which demonstrate that a drug hits its target and impacts its biochemical pathway. Such types of biomarkers are necessary to demonstrate proof of the drug's mechanism of action (POM), i.e. markers of pharmacological response. This class constitutes the majority of biomarkers in early phases of drug discovery (preclinical, phase I, and, probably, phase II). In correlation with pahrmacokinetic (PK) measurements, this class of biomarkers can help to determine effective dose and dose schedule. The biomarker illustration in Figure (2) shows that detection of an intermediary signal can indicate that the drug hit its target and the magnitude of the signal can reflect the efficacy of the interaction. The contribution of biomarkers to the goals of 87 phase I oncology trials was analyzed to reveal that biomarkers supported the proposed mechanism of action in 39% of the trials, contributed to dose selection for subsequent phase II studies in 13%, contributed to the selection of dosing schedule for phase II studies in 8%, and biomarkers were considered by the authors to be potentially useful for selecting a patient population in subsequent studies in 19% of the trials. These biomarkers were determined in serum (36.8% of total), tumor tissue (25.6%), peripheral blood mononuclear cells (22.7%), normal solid tissue (3.7%), and cerebrospinal fluid (0.2%), in addition to 10.9% by special in-vivo imaging. The non-imaging biomarkers included p r o t e i n s , c y t o k i n e s , a n d e n z y m e a c t i v i t y i n s e r u m , C S F , o r t i s s u e l y s a t e s , p r o t e i n s b y immunohistochemistry (IHC), and DNA and RNA gene expression (Goulart et al, 2007).

Prognostic biomarkers
Prognostic biomarkers can predict the risk or outcome of a disease in patient population without the involvement of therapy. For example, a population that tested positive for a given prognostic biomarker can survive longer or live better than another that tested negative. Figure (7) depicts the concept of a biomarker's ability to predict overall survival in Panel A but not Panel B. In addition to its predictive power, prognostic biomarkers may help enrich a clinical trial by choosing people more likely to respond to treatment. Examples of prognostic biomarkers include prostatic specific antigen to predict survival in prostatic cancer patients Kelloff et al, 2004), Preoperative CA125 to predict metastatic disease in patients with uterine carcinoma (Gupta et al, 2011), and CRP as a risk factor in cardiovascular events (Ridker et al, 2008 andAbd et al, 2011), CRP to predict reduced overall and disease-free survival breast cancer ( Allin et al, 2011), and serum LDH to predict overall survival in metastatic brain tumors (Eigentler et al, 2011). The number of circulating tumor cells (CTC) was shown to predict overall and progression-free survival in patients with metastatic breast and ovarian cancers (FDA, 2005 andPoveda et al, 2011), and to predict the effect of treatment earlier than imaging (Nakamura et al, 2010). Also, HER2positive CTC was suggested as a prognostic value in metastatic breast cancer (Hayashi et al, 2011).

Discrepant results and its major impact on clinical trials 7.1 Types of laboratory errors
Despite all of the potential benefits of using biomarkers to advance pharmaceutical research and development and to fully implement the concept of personalized medicine, discrepant results can be a threat to development programs by triggering false decisions. Many tools and strategies have been adopted to enhance laboratory quality, including internal quality control (QC) procedures, external quality assessment programs, certification and accreditation, licensing of lab professionals, continuing education programs, and the regulation of lab services. Despite these quality measures, some imminent sources of error still require urgent intervention.
Most errors affecting laboratory test results occur in the pre-analytical phase, where they account for more than 90% of the errors currently encountered within the entire diagnostic process, and the positive trends towards the reduction of laboratory errors over the past decade (predominantly those in the analytical phase) have hardly affected the pre-analytical phase (Lippi et al, 2006a;Lippi et al, 2006b andLippi, 2009). Those variables often result in sub-optimal or poor specimen quality with the impact of producing incorrect results. Laboratory errors can be classified as pre-analytical, analytical, or post-analytical.

Pre-analytical
Pre-analytical errors occur between the test order and the analytical phase, and may affect sample integrity and its suitability for analysis.  Table (3) lists the most common pre-analytical variables that may impact biomarker results.

Patient preparation for the test
These variables are the result of broad heterogeneity in several pre-analytical processes, mainly due to the lack of reliable guidelines (Lippi et al, 2006a, Lippi et al, 2006b, and Lippi, 2009). Assurance of availability of appropriate lab instruments (mainly centrifuges and freezers) during clinical site qualification, along with clear, concise, illustrative lab manuals, well-trained phlebotomists, good techniques for tissue biopsy, and onsite training are essential tools in mitigating the pre-analytical lab errors.

Post-analytical
Post-analytical errors may occur after a sample has been processed and analyzed by an instrument, as listed in Table (4) below.
-Improper documentation of test results, wrong manual transcription, or questionable interface between analyzing instrument and database. -Incorrect patient identification information entered at time of test. Patients' results may be mixed with one another. -Failure to recognize and act on abnormal results, e.g. repetition of samples with unexpected results or panic values. Table 4. List of most common post-analytical variables Although uncommon, post-analytical errors can be very serious, especially when producing alarming values without verification, e.g. very low platelet count from a sample which was inappropriately collected or mixed. To mitigate this class of errors, the central lab has to implement an effective process for sample identification and acquisition, proper interface of testing device with the database, and a process for identification and repeat of analysis, and probably recollection, of samples with unexpected abnormal values especially those with panic results. A pharmaceutical company needs to take the proper measures to ensure errorfree post-analytical phase and reporting.

Analytical
With momentous efforts from lab professionals and in-vitro diagnostic partners, clinical laboratory errors due to analytical issues have been significantly reduced over time with the evolution of innovative technologies and the implementation of a number of quality control and quality assurance (QC&QA) check points, including internal (electronic) QC, liquid QC, calibration, delta checks, method comparison, and proficiency testing, among other measures. Figure (8) illustrates the two types of analytical errors: random (A) and systematic (B), where random error can affect a sample or a few samples within an analytical run while systematic error affects all samples analyzed after an error has occurred and until it is fixed. Each biomarker assay has a "default" imprecision, oscillation of values from the same sample, when measured multiple times, around the average of observations. Typically, a clinical lab considers an assay to be well-performing if results from a quality control sample are nicely distributed around the average and within "Average ± 2SD" as shown in Figure  (8). Results scattered outside the 2SD limits in panel (A) denote random error, while panel (B) shows consistent drift or bias (systematic error). Random error is usually caused by the pipetting of wrong volumes, air bubbles, small clots, or inadequately mixed samples. Systematic error can affect a single analytical run or even just a part of a run, a few runs, or can have a longitudinal impact which may span the entire life of a testing device. Figure (8B) illustrates a short-term systematic error which may occur following inappropriate calibration of a device or an improperly qualified new lot of reagents. Figure (9) demonstrates a real example of the difference between results from splits of 43 samples analyzed for ALP on two different chemistry analyzers in two labs, where 9A shows that the actual values were remarkably higher in Lab B than Lab A, and 9B shows that the percent difference (% bias) of lab B from lab A ranged between 150 and 350% across different levels of ALP.

Lack of traceability as a major source of systematic errors
The overall quality of clinical laboratory results can be compromised by the lack of traceability; absence of true method-to-method or platform-to-platform, or even between different reagent lots for an assay, standardization; or at least harmonization of test results. If assays or technologies are properly validated via proper method-to-method and lab-to-lab comparison, lack of traceability aspects, except reagent lot-to-lot variability, can be easily highlighted. Assay validation is always completed using the same lot or a few lots of reagent at best, which cannot detect low-performing lots afterwards. This is not only the problem of "sophisticated" biomarker assays, e.g. IHC, ISH, genotyping, etc., it also impacts supposedly well-standardized chemistry assays t h a t h a v e b e e n u s e d f o r d e c a d e s a s standard-of-care.
In past publications (2009, 2011a, and 2011b), I emphasized the gravity of the problem. For one sample analyzed by more than 4,000 laboratories using different types of instruments and thromboplastin reagents, the INR (international normalized ratio of prothrombin time) values ranged between 2.9 and 7.6. When another sample was analyzed for activated partial thromboplastin time (aPTT), thrombin time (TT), and anti-FXa assay (Heparin test) in different labs employing different platforms and methodologies, the ratio of maximum to minimum reported results was up to 4-fold, 40-fold, and 50-fold, respectively. For yet another sample analyzed for ALP and LDH, the ratio was 4-fold. In other instances, percent difference between maximum and minimum HDL-c reported results from a sample analyzed on different platforms was up to 47%; and when another sample (with a target value of 152mg/dl as estimated by a standard procedure) was analyzed on different platforms for LDL-c, the reported results from different labs were between about 120 and 202 mg/dl. Lack of traceability between different lab platforms or methodologies, even for wellestablished technologies like chemistry or immunochemistry analyzers employed by central labs, is mainly due to unavailability of primary or secondary standards to calibrate devices or methodologies across different brands. Also, there is no "gold standard" device or methodology to use as a predicate even for well established lab analyzers. While greater automation and innovation has, in general, improved laboratory performance over the last decade, it is also a double-edged sword as, in the absence of a gold standard approach; this seems to contribute significant systematic bias between different devices and reagents. The way the FDA (2011) approves analytical devices or methodologies based on substantial equivalence to legally marketed devices (precedent devices) should be drastically revised.

Impact of lack of traceability on clinical trials
This long term systematic source of error is commonly overlooked and is often aggravated by the disconnect between clinical laboratory services on the one side, and clinicians, and drug developers on the other, and the misinterpretation of test results by following general clinical guidelines per test rather than using a reference range or set cut-off values for medical decisions for specific platforms/reagents. Considering the widely applicable INR therapeutic target range for Warfarin (2.0 to 3.0 units), a result from the sample mentioned above can be within the therapeutic target, indicate slight anticoagulation, or demonstrate dramatic anticoagulation which may need immediate medical intervention. The difference between maximum and minimum results from the anti-FXa example can be more than 23fold the unfractionated heparin (UFH) therapeutic range (0.35-0.70 U/ml). Following the National Cholesterol Education Program (NCEP) guidelines, a clinician may interpret the results from the LDL-c example as near optimal, borderline high, high, or very high and will treat his/her patient accordingly. The problem can impact decision making by pharmaceutical developers if they use absolute biomarker values to compare the outcomes of different studies on a drug's efficacy and/or toxicity, or employing biomarkers to bridge between different drug candidates belonging to a particular class of compounds. Also, global clinical trials may be impacted where different specialty or safety biomarker labs are employed. It is not uncommon for different lab locations within a global organization (or even within one lab location) to use different platforms interchangeably to analyze samples from the same trial. A pharmaceutical development program may take as long as 10 years or more, thus switching biomarker vendors is likely, using multiple platforms or changing platforms by a lab is common, and employing different lots of reagents and calibrators is definite. Without paying close attention to these variables, results from different platforms, even from the same lab, may lead to erroneous go/no-go decisions and make compatibility of results from different studies almost impossible. Also, unless appropriately understood and interpreted, if such lab tests are used as an efficacy or toxicity biomarker, the drug may be inappropriately labeled. The inter-laboratory discrepancies in results could be even higher than those included in my articles, because data were gathered from "well-controlled" laboratories for theoretically standardized tests used to manage patients' health and as surrogate biomarkers in clinical trials. Until global standardization or harmonization approaches are employed, the pharmaceutical industry needs to monitor biomarker data rigorously and understand these challenges for better interpretation of biomarker results.

Impact of discrepant results on personalized medicine
As explained earlier, companion diagnostics (CDx) are essential in enabling target therapeutic products to achieve their expected safety and efficacy. Therefore, the risk from failure of CDx is equal to the risk of wrongly using therapeutic products. It has been reported that BCR-ABL (leukemia biomarker and approved CDx) gene transcripts have been analyzed at over 150 hospitals and labs and results were non-comparable, where the number of transcripts reported from 6 CLIA-certified, reliable labs (two commercial and 4 cancer institutions) varied by more than 2Logs. The introduction of common primers/reagents/calibrators, which was difficult to achieve, improved comparability (Jessup, 2011). In addition to the adverse effect of incorrect test results on patient management, the use of loosely validated assays may spoil a trial outcome and impose a wrong go/no-go decision, especially if the rate of target mutation is relatively small as shown by figure (10). For example, suppose the rate of a mutant gene is 20%, and the rate of response to a therapeutic agent is 70% and 10% in patients with mutant and wild type (WT) genes, respectively, such as tyrosine kinase inhibitors in mutant and WT EGFR (Mitsudomi and Yatabe, 2007), if the biomarker assay is 90% sensitive and 90% specific, which can be considered acceptable or good by some professionals, it would have two implications: 1. In the clinical trial, the efficacy signal will be diluted, as instead of the two arms (WT and M) being cleanly separated (100% WT and 100% M in the corresponding arms) and the efficacy in the M arm clearly demonstrated, the signal in the M arm would be diluted by the carryover from the WT falsely identified as M. In this case, the average efficacy signal, or overall survival, will be less than the 100% M if identified by the 100% specific assay. The signal in the WT arm would erroneously increase due to the carryover from M falsely identified as WT because of the 10% false negative, but as the majority is still WT, the impact is not as substantial as in the M arm. In this example, the average efficacy signal would be 0.52 and 0.12 in the M and WT arms instead of 0.70 and 0.10 had the assay been 90% sensitive and 90% specific versus 100% and 100%. This means that the ratio of efficacy signal (in M arm to WT arm) would be reduced from 7.0 to 4.4. Using the same model, such reduction in efficacy signal due to an assay's low performance would change the efficacy signal in the target population of any given drug, such as Herceptin, where the average overall survival for patients with high levels of HER2 and control arm would change from 16 and 11.8 months (Roche, 2010) to 14.7 and 11.9 months, respectively, decreasing the efficacy signal from 1.4 to 1.2. 2. If the biomarker is used as a CDx to qualify a patient for treatment after drug approval, two out of the 20 M will not be given the drug (10%), and 8 out of 80 subjects with WT (10%) will be wrongly treated with the drug.

Conclusion
There is no doubt that biomarkers can play a vital role in drug development as tools to monitor drug toxicity, prove a compound mechanism of action, prove the concept for which a drug will be used, and predict efficacy and toxicity. Biomarker hypothesis-driven drug development and personalized medicine seem to be the future of drug industry. However, despite the enormous enhancement in biomarkers laboratories' level of quality, some sources of errors still pose an imminent risk to drug development. Due to lack of standardization, even for well trusted safety biomarker assays, a major source of error is the discrepant results from different laboratories or even from the same lab employing different platforms or methodologies. This source of error is commonly overlooked and is often aggravated by the disconnect between clinical laboratory services on the one side, and clinical guidelines, clinicians, and drug developers on the other. Results from the same sample can vary substantially, even for trusted standardized tests from "well-controlled" laboratories, with consequent impact on drug developers' decisions and patient management including personalized medicine approach. It may be expensive for pharmaceutical companies to operate and maintain in-house laboratories if assets are underutilized, due to a global shortage of good laboratory and QA professionals, a resulting difficulty in acquiring and maintaining laboratory certification and licensure, and rapidly evolving technologies. Therefore, outsourcing the lab services can be an attractive option. Using contract research organization or academic institution laboratory services may reduce overhead and operating costs and provide pharmaceutical companies with access to new technologies as needed. While greater automation has, in general, improved laboratory performance over the last decade, it is also a 'double-edged sword'. The increase in automation combined with consolidation of instrument/reagent/calibration manufacturers has resulted in many suppliers oversimplifying technology and electronically locking-out laboratories from using competitors' reagents or independent calibrators so as to increase sales and profits. Thus, laboratories have become deeply dependent on suppliers for their quality and are often forced to change methods, instruments, calibrators and reference ranges at the whim of suppliers. This has been further compounded by many laboratories attempting to cut costs by reducing experienced and educated laboratory professionals (doctoral level and even master's level) who have the knowledge and experience to maintain stable calibration and optimal accuracy and precision. In fact, some labs have gone further by reducing bench level personnel from 4-year degree certified medical technologists to 2-year associate degree laboratory technologists or lower.

Mixed population (20M+80WT)
100% specific, 100% sensitive 90% specific, 90% sensitive Regulatory and laboratory accrediting agencies need to pay more attention, find resources, and implement a road map to fix the major challenge in biomarker laboratory; lack of traceability between different technologies. Meanwhile, it remains the responsibility of drug developers to assure that a biomarkers lab has the right tools and skills to analyze samples from a clinical trial, the assay validation is at the level of the decision to be made, and that biomarker data are properly interpreted.