Introduction

Acute lung injury (ALI) and its more severe form, the acute respiratory distress syndrome (ARDS), are syndromes of acute respiratory failure that are characterized by acute pulmonary edema and lung inflammation. ALI and ARDS include a spectrum of increasing severity of lung injury defined by physiologic and radiographic criteria. The American–European consensus committee has defined the illness as acute onset with a PaO2/FiO2≤ 300 for ALI and ≤ 200 for ARDS, bilateral infiltrates on chest radiograph, and a pulmonary artery occlusion pressure < 18 mmHg or the absence of clinical evidence of left atrial hypertension [1]. ALI has an incidence of approximately 200,000 patients per year in the United States [2]. Estimates of mortality rates for ARDS vary from 26% to 58%, and epidemiologic data indicate that ARDS may account for up to 75,000 deaths per year in the USA [2, 3, 4].

Alterations in the levels of mediators of coagulation such as protein C, thrombomodulin, and plasminogen activator inhibitor-1 (PAI-1) have been associated with worse clinical outcomes in patients with ALI. It is not clear whether the alterations in these protein levels are determined exclusively by environmental factors that precipitate ALI, such as the virulence of the infection, the extent of aspiration, and the severity of shock or whether the alterations in coagulation protein concentrations are determined by genetic factors as well. Clinical and epidemiological observations in patients with ALI and data from animal experiments support the hypothesis that genetic factors may play a part in the susceptibility to development of ALI. Polymorphisms in genes encoding for proteins of the protein C and fibrinolytic pathways are associated with altered levels of their respective mediators, and phenotypes of disordered coagulation. It is possible that these genetic polymorphisms are associated with the clinical outcomes in patients with ALI because they can regulate production and activity of proteins of the coagulation and fibrinolytic pathways. An association between polymorphisms of the genes encoding for these proteins and clinical outcomes would (a) strengthen the importance of the association between the proteins and clinical outcomes, (b) provide further evidence for a cause and effect relationship of this association, (c) lead to a better understanding of the biological role of different genes and their products and in the pathogenesis of ALI and (d) help develop profiling of clinical risk, so that specific therapeutic and preventive measures might be directed towards more susceptible patients [5].

This article summarizes the coagulation pathway abnormalities in ALI, presents evidence for a possible contribution of genetic susceptibility to ALI/ARDS, and provides an overview of important considerations in the design and interpretation of investigations into the genetic basis of ALI focused primarily on the protein C and fibrinolytic pathways.

Role of coagulation and fibrinolysis in ALI/ARDS

Activation of the coagulation cascade, specifically thrombin formation, can lead to inflammatory events [6]. Several inflammatory mediators are involved in the activation of coagulation and, in turn, coagulation proteins are themselves actively involved in the inflammatory process. Interaction of coagulation and inflammatory pathways are well described in related clinical conditions like sepsis and cardiopulmonary bypass [7, 8].

Intra-alveolar fibrin deposition is a hallmark of several acute inflammatory lung diseases, and enhanced alveolar procoagulant activity is reported in patients with ventilator-associated pneumonia, severe pneumonia, and ARDS. Fibrin deposition may exert beneficial effects on gas exchange by sealing leakage sites if the lung capillary endothelial and epithelial barriers are compromised. On the other hand fibrin deposition in the alveoli may be harmful since it can lead to activation of neutrophils and fibroblasts, compromise endothelial integrity, contribute to loss of surfactant activity favoring alveolar collapse, decrease alveolar fluid clearance and induce thrombotic obstruction of the microcirculation [9, 10, 11, 12, 13, 14]. Injury to the pulmonary microcirculation from inflammatory and thrombotic mechanisms probably contributes to increase in dead space fraction that may be an independent predictor of mortality in ARDS [15]. Elevated plasma tissue factor levels in patients with ARDS have been associated with poor clinical outcomes, suggesting that activation of systemic coagulation may be detrimental in patients with ARDS [16].

ALI and the protein C and fibrinolytic pathways

The protein C pathway plays an important role in both the regulation of blood coagulation and the prevention of lethal effects of sepsis. Protein C is an endogenous anticoagulant that is activated by thrombin in the presence of thrombomodulin. The membrane-bound endothelial protein C receptor (EPCR) potentiates this activation (Fig. 1). Activated protein C (APC) in turn inactivates the activated factors V and VII, thereby inhibiting the coagulation cascade [17]. In addition, neutrophils express EPCR on their surface and APC inhibits neutrophil chemotaxis in experimental models of pulmonary inflammation [18, 19].

Fig. 1
figure 1

Protein C (PC) is converted to activated protein C (APC) while bound to endothelial protein C receptor (EPCR) in the presence of thrombin (T) and thrombomodulin (TM). APC then inactivates factors V and VII, thereby inhibiting the coagulation cascade

Pulmonary edema fluid obtained early in the course of ALI has lower concentrations of protein C than do controls with hydrostatic pulmonary edema, and reduced plasma protein C levels are associated with higher mortality and more non-pulmonary organ system dysfunction (Fig. 2). In addition, markedly elevated levels of thrombomodulin in the pulmonary edema fluid from patients with ALI suggest that release of thrombomodulin probably occurs from both epithelial and endothelial sources in the lung [20].

Fig. 2
figure 2

a Boxplot summary of plasma protein C levels in 10 normal controls and 45 patients with ALI. Box encompasses 25th to 75th percentile, error bars encompass 10th to 90th percentile, and horizontal bar shows median. b Plasma protein C levels in 45 patients with ALI were consistently higher in those with favorable clinical outcomes, including hospital survival, greater than 14 days of unassisted ventilation over a 28-day period, less than two organ failures or the absence of circulatory shock. Data as mean ± SD, [Ware LB, Fang X, Matthay MA (2003) Protein C and thrombomodulin in human acute lung injury. Am J Physiol Lung Cell Mol Physiol 285:L516, Fig. 1 and Fig. 2. Reprinted with permission of The American Physiological Society]

Fibrin is degraded by plasmin, a proteolytic enzyme present in the tissues in the form of an inactive precursor, plasminogen. Plasminogen activator inhibitor-1 is the primary inhibitor of plasminogen activation and thus is a major inhibitor of fibrinolysis. Fibrinolysis is impaired in ALI, in part from elevated levels of PAI-1. Pulmonary edema fluid obtained early in the course of ALI has elevated concentrations of PAI-1 (Fig. 3). The elevated PAI-1 levels have predictive value for identifying patients who are more likely to die among all patients with ALI, regardless of the clinical risk factor that predisposes to the development of ALI [21].

Fig. 3
figure 3

Plasminogen activator inhibitor-1 (PAI-1) antigen levels in patients with acute lung injury (ALI) and hydrostatic pulmonary edema (HYDRO). PAI-1 antigen level was measured in plasma (open boxes) and edema fluid (filled boxes) from patients with either clinical ALI or HYDRO by ELISA. Data are plotted in box plot format (median, 25--75% and compared using the Mann--Whitney U-test. P values are as indicated. [Prabhakaran P, Ware LB, White KE, Cross MT, Matthay MA, Olman MA (2003) Elevated levels of plasminogen activator inhibitor-1 in pulmonary edema fluid are associated with mortality in acute lung injury. Am J Physiol Lung Cell Mol Physiol 285:L22, Fig. 1. Reprinted with permission of The American Physiological Society]

An analysis of 779 patients from the ARDS network clinical trial of low tidal volume versus high tidal volume has confirmed these findings [22, 23]. Lower levels of plasma protein C and higher levels of plasma PAI-1 were independent predictors of mortality, and the combination of low protein C and high PAI-1 carried an even higher risk of mortality. The prognostic value of protein C and PAI-1 was not altered by exclusion of patients with co-existing sepsis. Thus, abnormalities of coagulation and fibrinolysis are not simply a manifestation of sepsis, but are associated with ALI, regardless of the underlying cause. Interestingly, a protective ventilator strategy did not have a significant effect on plasma levels of protein C and PAI-1. The alterations in the levels of these coagulation and fibrinolysis proteins may be in part determined by genetic factors independent of the degree of direct lung injury.

Evidence for a possible genetic contribution to ALI

Variability in susceptibility and outcome

Both adult and pediatric patients develop ALI [4]. A number of clinical disorders that directly or indirectly injure the lung can lead to ALI, but of the apparently large population at risk of developing ALI, only a small fraction develop the clinical syndrome. Among patients who have identical risk factors for lung injury, such as sepsis and aspiration, fewer than 50% develop ALI [24, 25]. Likewise, it is difficult to predict who will recover from ALI. Current clinical and physiological measurements cannot account for outcome variability. Individual susceptibility and variation in the clinical course and outcome may be related in part to the underlying genetic characteristics of the individual patient.

Positional cloning linkage studies

Genome-wide searches in animal models have identified a number of quantitative trait loci (QTL) that associate with susceptibility to experimental lung injury. A QTL on chromosome 4 that contains the gene for the toll-like receptor 4 is associated with ozone-induced injury in animal models [26]. An association between ozone and nickel induces lung injury, and a region of chromosome 6 that contains genes for pro-inflammatory cytokines, as well as HLA, has also been reported in animal models. Similarly, nickel sulfate (NiSO4)-induced ALI in rat models is controlled by at least five genes, including aquaporin 1, transforming growth factor-α (TGF-α), and surfactant protein B [27].

Gene expression studies

Researchers at the Hopgene Programs for Genomic Applications and other investigators have used micro-arrays to study gene expression profiles in animal models subject to mechanical stress and in human endothelial cells subject to stretch in vitro. These investigators reported up-regulation in the expression of multiple genes involved in the immune response, including chemotaxis, cell proliferation, inflammatory responses, and coagulation genes for PAI-1, tissue factor precursor, and fibrinogen-α [28].

Candidate gene association studies

Much of the genetic variation between individuals lies in differences known as single-nucleotide polymorphisms (SNPs). Polymorphisms are variant forms of genes that occur in at least 1% of the population. Associations between susceptibility to and severity of ALI and polymorphisms encoding cytokines and other mediators of inflammation have been reported in a number of association studies [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41] (Table 1). Results of these studies suggest that it is unlikely that a single gene or SNP will explain the diversity in risk, clinical manifestations (diverse phenotypes) and outcomes of ALI. ALI is probably a disease with genetic heterogeneity involving a number of genes at different loci with interactions both within the multiple genes and with environmental factors that predispose to the development of the clinical disorder. Based on our prior work and that of other investigators [16, 20, 21, 22, 23, 42], we hypothesize that genes encoding for the mediators of the coagulation pathways have a strong biological plausibility for a significant role in the susceptibility to and clinical severity of ALI.

Table 1 Association studies of genetic polymorphisms and ALI

The study of genetic factors

Genetic factors traditionally have been studied using either positional cloning or focused candidate gene approaches. Positional cloning approaches have most often entailed genome-wide linkage studies of families with multiple affected individuals. Since the precipitating event to ALI tends to be individual-specific, it is not suitable for family-based study designs.

More recently, genome-wide association studies of unrelated individuals have been used as well. However, these studies are hypothesis generating because they lead to the identification of a limited set of genes, which can then be further studied. It is also possible to begin with a candidate gene approach, evaluating the association between genetic variants and disease directly. The first step is to identify genotypes a priori, based on the results of previous positional cloning studies or biological plausibility, and then study the association of specific polymorphisms with the phenotype of interest (mild versus severe disease or the presence or absence of disease). Some of the important considerations (challenges) encountered in the design and interpretation of genetic association studies are detailed in the next section.

Considerations in the design of association studies

Candidate genes

The first step in the design of association studies is the selection of the most appropriate candidate genes. Knowledge of the gene and SNP function is mandatory for the appropriate design and interpretation of candidate gene association studies. Investigators usually rely on making an assessment of the function of a candidate gene or SNP from previous work and published literature  [43].

A number of SNPs and haplotypes in the genes encoding for mediators of protein C and fibrinolytic components of the coagulation pathway alter levels of their respective mediators. Some of these polymorphisms and SNPs are relatively common in the general population (MAF > 5%) and have been associated with specific disease phenotypes of disordered coagulation (Table 2) [44, 45, 46, 47, 48, 49, 50]. However, there are other less frequent but well-described mutations and polymorphisms in the coagulation pathway, which may also have a role in ALI due to their effect on the coagulation cascade [51, 52, 53].

Table 2 Well-characterized coagulation pathway polymorphisms that may have a role in ALI

Biological plausibility

Finding an association does not provide proof of causality. Also, defining the biological mechanism underlying the associations should be an important part of a genetic association study. The gene product must be known to alter the risk or course of the disease, and the proposed mechanisms by which the gene product may influence the course of the disease should either be known or be a part of such a study. Whenever the biological plausibility for the choice of a candidate gene is based on a gene product that can be measured in plasma or other body fluids, it is preferable to include these measurements in both the plasma and broncho-alveolar lavage fluid. This should be done regardless of what is known about the effect of the particular polymorphism on the baseline levels of the gene product. For example, the 4G/5G genotype accounts for only a small part of variance in PAI-1 levels at baseline, but the difference in the levels of PAI-1 between 4G and 5G becomes more obvious in the presence of environmental and disease factors that stimulate PAI-1 expression [54, 55, 56]. Associations with these intermediate phenotypes, whenever present, will further strengthen the significance of the overall association with the clinical phenotype.

Defining the phenotype

Special attention is required in the design of association studies for correct classification of cases and controls. Improper classification may lead to a bias in the association and reduced power to detect an effect. Definition of the phenotype in ALI is problematic because there is no definitive diagnostic test for ALI. In addition, ALI has a complex etiology and there may be multiple subtypes within the ALI phenotype [57]. One option is to study ALI from a specific etiology to decrease the heterogeneity of the cases, e.g., ALI due to direct pulmonary injury, ALI due to trauma or ALI due to pancreatitis. However, the potential disadvantages of this approach include the possibility that there may be further subtypes within each of these groups and that sufficient number of cases with a homogenous phenotype may be difficult to ascertain, especially in single-center studies [57]. On the other hand, even though the etiology of ARDS is complicated, a common injury pathway may contribute to the damage of lung tissue in these patients [58]. Though there are limitations to the existing definitions of ALI and sepsis, syndromes can be studied with a reasonable degree of validity and reliability as has been done in other disciplines [59]. The American European Consensus definition serves as a uniformly accepted guideline for defining lung injury and is the most universally accepted criterion for ALI/ARDS. However, in order to avoid misclassification that may result from lack of reliability among the ALI diagnostic criteria, more than one investigator should review the entire primary diagnostic data, including the chest radiographs. Only subjects with a high degree of agreement among the different investigators should be included in such studies.

Sample size and power

Despite the fact that the power of association analysis to detect genetic contribution to complex disease is greater than that of linkage studies, many association studies suffer from a lack of adequate power because of a small sample size [60]. The number of subjects needed to achieve adequate power is dependent on the frequency of the polymorphism, the disease model (dominant, recessive, co-dominant), and the genetic risk ratio, a measure of gene effect on the scale of risk [61, 62]. Therefore, studies based on polymorphisms with a higher minor allele frequency require a smaller number of subjects for adequate power than studies based on rare polymorphisms with similar genetic effect. Also, rare variants (< 5%) are more likely to be limited to specific population groups, whereas common alleles (> 10%) are more likely to be found globally [61]. The common disease--common variant hypothesis proposes that most of the genetic risk for common complex diseases is due to a small number of relatively common variants [63]. However, it is possible that a rare allele with a strong penetrance may have a more profound effect on the phenotype.

Added power and precision is provided in epidemiological studies by continuous variables relative to categorical variables [64]. There are advantages to the use of quantitative traits as outcomes, especially in studies of modifier genes in complex diseases [65]. Association studies using quantitative traits as outcome in complex diseases have been previously reported in asthma, Alzheimer's disease, diabetes and obesity [66, 67, 68, 69, 70]. Using a continuous phenotype is less ambiguous and more informative, objective and statistically powerful than binary categorizations of disease status [71]. Although, similar to the binary outcomes (e.g., mortality), continuous phenotypes (e.g., ventilator-free days, pulmonary dead space fraction, plasma levels of the mediator) as the outcome in association studies of ALI are susceptible to ventilator strategies and interventions such as fluid loading or diuresis, they may yield greater power than dichotomous outcomes such as mortality, depending on their effect size and variability. Using standard power calculations [72] we estimated the sample size required for α = 0.05 and β = 0.2 (power of 80% at a significance of 95%) in a study with a minor allele frequency of 25%, dominant mode of inheritance, and a continuous phenotype as the outcome (Table 3). If there is a sufficiently large effect, even a modest number of patients may yield satisfactory power in such study designs. If the effect is smaller, substantially more subjects will be required. In contrast, a study using a dichotomous endpoint such as mortality (estimated 25%) would need a sample size of approximately 500 patients to detect an OR of 1.8 or more with the assumptions described above.

Table 3 Sample size estimates for continuous outcomes (assumptions: allele frequency=0.25, α = 0.05, β = 0.2)

Selection of controls

Controls should represent the reference population at risk for development of the disease that have not yet developed the disease. There is no agreement on ideal controls that best match the reference population in ALI or sepsis [57, 73]. Studies of ALI in the past have utilized either normal healthy individuals or critically ill patients at risk for development of ALI as controls. Studies that utilize the healthy population as controls may result in identification of genes that predispose to development of critical illness rather than to the identification of genes that predispose to ALI itself. Studies utilizing just the hospital-based at-risk population may be confounded by a selection bias as well. They tend to eliminate the population of controls that contributes to cases with rapidly evolving ALI presenting directly with ALI/ARDS without passing through an intermediate phenotype. In order to overcome these issues, one may select two groups of controls, i.e., healthy age- and ethnicity-matched controls from the community and another group of hospital-based at-risk controls. Another approach is to use the case-cohort study design, where a cohort of at-risk patients is followed for the development of ALI, thereby eliminating bias that may occur due to inappropriate selection of a control population. This approach requires significantly more resources and patients to allow a sufficient number of cases to emerge before the association study can proceed [60].

Linkage dysequilibrium

Linkage dysequilibrium refers to the observation that specific neighboring alleles tend to occur together on the same chromosome more often than would be expected by chance alone. The combination of alleles along a chromosome that are all in high linkage dysequilibrium with each other is referred to as a haplotype. A chromosomal region with high linkage dysequilibrium and low haplotype diversity is referred to as a haplotype block. Each time a mutation occurs, it results in the creation of a new haplotype. Over generations this linkage is broken into several further haplotypes by recombination, new mutations and genetic drift. Haplotypes grouped on the basis of their evolutionary origin are referred to as clades.

Both haplotype-based and single SNP approaches (Table 4) can be applied concurrently to candidate genes that are implicated in disease pathogenesis, and the two approaches are not mutually exclusive. Combining information on re-sequencing of coagulation genes available at the Seattle SNPs website (http://pga.mbt.washington.edu) with one of the available software programs to identify areas of linkage dysequilibrium and tag SNPs representing these blocks of linkage equilibrium, association studies can be designed that can account for most of the common genetic variation in these genes [74, 75, 76]. This approach can be done in addition to testing for specific SNPs that have been shown to be associated with altered levels of gene product in previous studies.

Table 4 Comparison of a haplotype versus single SNP approach}

Multiple comparisons

The increasing number of candidate genes implies that multiple association scans will be conducted on the same patient samples. This approach leads to testing of multiple hypotheses, which can lead to false-positive results. However, the increased false-positive rate from multiple comparisons in genetic association studies is less than the expected increase if the markers are entirely independent. Therefore, Bonferroni correction of these data markedly overcorrects for the inflated false-positive rate and thereby discards valid information in the sample [60]. Use of computational techniques based on multiple permutations of the data set to obtain a distribution for the test statistic under the null or the use of the false discovery rate method are accepted methods of adjusting for multiple comparisons in genetic association studies [77].

Population stratification

Confounding by population stratification is another important issue in population-based association studies. False associations may be reported if there is variation in the incidence or severity of a particular disease among different ethnic groups, which may also happen to have a variation in the allele frequencies of the studied SNP. This issue can be managed at the design stage by choosing to study only a single ethnic group or by stratifying by ethnicity. An alternative approach involves testing for other unrelated SNPs at the so-called null loci and then adjusting for population stratification using either structured association or genomic control strategy. The structured association method reconstructs the population substructure using the data from the many unlinked markers, splits the data accordingly, and then tests for allele frequency differences between those sub-populations [78]. Modifications to improve the robustness of these procedures have also been described [79]. On the other hand, the genomic control method estimates background association between populations by using the data from SNPs not related to the disease of interest. The individual test statistic of interest iscorrected by dividing it by a correction factor, which is equal to the median of the entire test statistics generated for the null loci. This method adjusts for the overdispersion of the test statistics generated due to population substructure [80]. However, there can be loss of power in the face of increasing population substructure, and therefore experiments should preferably be designed to minimize population substructure [81].

Hardy–Weinberg equilibrium

Hardy–Weinberg equilibrium indicates that in a freely breeding population genotype frequencies can be determined directly from allele frequencies. It can be readily assessed with the use of a goodness-of-fit chi-square test for bi-allelic markers. Testing for Hardy–Weinberg equilibrium is frequently used in genetic association studies to test for internal validity. Failure to demonstrate Hardy–Weinberg equilibrium among controls in an association study can result from genotyping errors, inbreeding, population substructure or selection bias [71]

Conclusions

There is considerable interest in testing candidate genes that may increase the susceptibility to or severity of ALI. There is some evidence that dysregulated coagulation and impaired fibrinolysis may contribute to the pathogenesis of ALI. Altered levels of the plasma and pulmonary edema fluid protein concentrations of protein C, thrombomodulin, tissue factor, and PAI-1 are biological markers of worse clinical outcomes in patients with ALI. However, it is not known whether the altered concentrations in these mediators are dependent only on environmental factors, such as the severity of pneumonia, aspiration or shock, or also on underlying genetic factors. Several polymorphisms in genes encoding for these coagulation proteins are present in the general population with high allele frequencies and are associated with altered levels of the gene products and clinical phenotypes of disordered coagulation. Therefore, clinical studies are needed to test the possible contributions of these polymorphisms to the severity and susceptibility to ALI. These studies should include determination of the polymorphism at the genetic level as well as measurement of levels of the respective gene product in the plasma and broncho-alveolar lavage fluid of patients with ALI. Several issues need to be carefully managed during the design of such studies, including definition of the phenotype, selection of controls, choice of candidate genes, attention to linkage dysequilibrium and adjustment for multiple comparisons and population stratification.