Next-generation personalized drug discovery: the tripeptide GHK hits center stage in chronic obstructive pulmonary disease

Chronic lung diseases (CLDs), including chronic obstructive pulmonary disease (COPD), are the second leading cause of death worldwide. The first report of database-driven drug discovery in carefully phenotyped COPD specimens has now been published in Genome Medicine, combining gene expression data in defined emphysematous areas with connectivity-map-based compound discovery. This joint effort may lead the way to novel and potentially more efficient concepts of personalized drug discovery for COPD in particular, and CLD in general. See research article http://genomemedicine.com/content/4/8/67/abstract

Chronic lung diseases (CLDs), including chronic obstructive pulmonary disease (COPD), asthma, lung cancer, neonatal chronic lung disease and pulmonary fi brosis, are the leading diseases worldwide with regard to mortality, prevalence and socioeconomic burden [1,2]. Th e devastating impact of these diseases is largely due to increased environmental eff ects on lung health in recent years (for example, air pollution, allergens, and cigarette smoke or related particles), limited understanding of their mechanisms of pathogenesis, and consequently, insuffi cient therapeutic options. Importantly, COPD is the only leading cause of death with increasing incidence among the top diseases: by 2020, more than 6 million deaths every year are projected to be due to COPD [2]. COPD is defi ned as 'a decreased airfl ow that is not fully reversible' and is classically diagnosed by lung function testing (FEV 1 , forced expiratory volume in one second). Pathologically, COPD is characterized by airway infl ammation, loss of functional alveolar tissue, irreversible airfl ow obstruction and loss of lung function [3,4]. It has two main pathological features: small airway disease (SAD; which includes airway infl ammation with increased mucous production, activation of immune cells, airway wall remodeling and peribronchiolar fi brosis) and emphy sema (defi ned as the destruction of the distal alveolar architecture due to distal airspace enlargement). Ultimately, both features can lead to a loss of functional alveolar epithelium and impaired lung function, with an apparent inability of the lung to self-repair.
It has long been known that a decline in lung function does not accurately correlate with the degree of emphysema or SAD. It has therefore been a major challenge to defi nitively phenotype COPD patients with regard to their underlying dominant pathological process (that is, emphysema or SAD), and this has hampered clinical study design and drug discovery. Until recently, approaches for the accurate correlation of gene expression signatures with histological subtypes of COPD were not available, and this has prevented the molecular characterization of cell populations involved in dominant pathogenic processes at diff erent stages of COPD/emphysema. Th is is in striking contrast to the extensive profi ling of idio pathic pulmonary fi brosis (IPF), which has resulted in a clear gene expression signature in the lungs and blood of individuals with IPF, accelerating biomarker identifi cation and disease stratifi cation for this disease in recent years [5].

De nition of a strong disease signature
A recent study by Campbell et al. published in Genome Medicine now appears to overcome these challenges in COPD [6]. Th e study is remarkable for the following reasons: (1) gene signatures in diff erent grades of emphysema were derived from regions of quantitatively

Abstract
Chronic lung diseases (CLDs), including chronic obstructive pulmonary disease (COPD), are the second leading cause of death worldwide. The fi rst report of database-driven drug discovery in carefully phenotyped COPD specimens has now been published in Genome Medicine, combining gene expression data in defi ned emphysematous areas with connectivity-map-based compound discovery. This joint eff ort may lead the way to novel and potentially more effi cient concepts of personalized drug discovery for COPD in particular, and CLD in general.

R E S E A R C H H I G H L I G H T
*Correspondence: oliver.eickelberg@helmholtz-muenchen.de Comprehensive Pneumology Center, University Hospital of the Ludwig-Maximilians-University Munich and Helmholtz Zentrum München, Member of the German Center for Lung Research, Max-Lebsche-Platz 31, D-81377 Munich, Germany assessed pathology (by micro-computed tomography and stereology); (2) these signatures were extensively compared with available COPD datasets published by diff erent groups; (3) these validated signatures were then queried using the Connectivity Map (CMap) [7,8] for compounds that might be capable of reversing the emphysematous phenotype; and (4) one identifi ed compound (the tripeptide GHK) was then tested in vitro for proof-of-principle (Figure 1).
In a joint venture by several leading COPD groups, a comprehensive approach was then used to assess gene expression patterns in lung regions of COPD specimens with defi ned severities of emphysema [6]. Th e venture included groups with outstanding expertise in microand macro-imaging of the lung, gene expression analysis, clinical phenotyping, and most importantly, bioinformatics. Th e authors made use of a recently published novel method to assess regional severity of emphysema by determining the mean linear intercept between alveolar walls using micro-computed tomography [4]. Th ey identifi ed a set of 127 diff erentially regulated genes that were signifi cantly associated with the degree of emphysema in the lung. Importantly, this signature was further validated using gene expression data from other crosssectional studies of COPD. Th is signature of 127 genes was signifi cantly enriched among COPD-regulated genes in four out of fi ve previously published data sets, validating this set as a strong signature for lung emphysema.
Importantly, enrichment analysis of gene functions revealed an over-representation of genes involved in Bcell receptor signaling and reduced expression of genes involved in transforming growth factor (TGF)-β signaling. Indeed, an increasing number of CD79A-positive B cells correlated with the severity of emphysema, as shown by immunohistochemical analysis. Using an elegant bioinformatics approach, the authors further validated these gene sets by comparing them with a defi ned TGF-β activation signature, using seven publicly available gene expression studies on TGF-β-induced gene expression. Interestingly, both the induction of B-cell receptor signaling and downregulation of TGF-β activation were not fully appreciated in previous COPD profi ling studies. Th us, strong and validated disease signatures depend on the stringency of data analysis and comparison with apparently unrelated, publicly available data sets from other studies.

Unbiased identi cation of new drugs
Another strength of the study by Campbell and colleagues [6] lies in the use of the enriched emphysema gene expression signature for an unbiased drug screening approach by in silico connectivity mapping. CMap exploits the transcriptome as a 'universal language' describing cellular responses [7,8] to distinct drugs to connect drug discovery, biology and disease phenotypes. Classically, to exploit CMap, a disease phenotype, represented by a strong gene expression signature (in this case emphysema), is compared with gene expression profi les of cells that have been treated with distinct drugs (here GHK) for various exposure times. While the initial CMap reference catalogue comprised gene expression profi les of 164 drugs, the current catalogue contains 1,309 Food and Drug Administration (FDA)-approved small molecules that have been evaluated in fi ve cancer cell lines [8]. CMap can thus be used as an unbiased in silico screening approach to identify drugs that have either positive or negative connectivity scores, implicating these drugs either as potential inducers of disease phenotypes or therapeutics thereof.
Campbell and co-workers used their enriched emphysema signature and screened for compounds that demonstrated connectivity with regard to gene expression changes observed in emphysema. Connectivity analysis was performed with high stringency by double fi tting of the two gene expression signatures, that is, induced with severity of emphysema and deactivated TGF-β signaling. Th ey identifi ed the tripeptide GHK, a natural tripeptide involved in wound healing, which showed inverse connectivity. Importantly, in experi men tal studies using GHK, it partially reversed emphysemaassociated phenotypes in fi broblasts from COPD patients.

Future directions
It will now be of utmost interest and importance to see whether further animal experiments will confi rm that GHK is indeed able to revert disease in vivo, using relevant animal models of emphysema or COPD. If this is shown to be the case, clinical studies will then surely investigate whether GHK has clinical potential for individuals with COPD with a dominant emphysematous phenotype. It will be interesting to see whether such a comparative bioinformatics approach will generate new and/or overlapping gene expression signatures in lung, blood or bronchoalveolar lavage samples not only in COPD, but also in other heterogeneous CLD, such as asthma or chronic neonatal lung disease.
Since the release of CMap, several studies have shown the feasibility of this drug-screening approach for the identification of new therapeutics, drug repurposing, and the prediction of off-target or side effects, as reviewed by Qu and Rajpal [8]. Recently, Wang and colleagues [9] applied a similar in silico approach to screen for candidate therapeutic compounds for lung adenocarci noma. The CMap approach also offers the possibility of predicting synergies of drug combinations. In the future, the use of CMap might also add to our understanding of disease pathogenesis by connecting gene expression profiles of drugs that specifically inhibit single pathways with transcriptional profiles of disease phenotypes.
The use of gene expression profiling in histologically quantified tissue specimens with CMap querying reveals an exciting new approach to perform knowledge-based drug screening. Biomedical research has classically involved the tedious mechanistic understanding of single molecular targets and their interactions as a starting point for lead compound identification and subsequent optimization, but future research might move towards a more empirical analysis of gene expression signatures in combination with drug signatures.
In summary, the approach described herein (Figure 1) might facilitate lead compound identification and repurposing of drugs. Drug discovery is currently estimated to take approximately 15 years, with 90% of drugs failing to move beyond early clinical testing stages [10]. Therefore, such an approach is expected to save precious time and money, and ultimately might have the potential to decrease disease burden more effectively.