Human collectin-11 (COLEC11) and its synergic genetic interaction with MASP2 are associated with the pathophysiology of Chagas Disease

Chagas Disease (CD) is an anthropozoonosis caused by Trypanosoma cruzi. With complex pathophysiology and variable clinical presentation, CD outcome can be influenced by parasite persistence and the host immune response. Complement activation is one of the primary defense mechanisms against pathogens, which can be initiated via pathogen recognition by pattern recognition molecules (PRMs). Collectin-11 is a multifunctional soluble PRM lectin, widely distributed throughout the body, with important participation in host defense, homeostasis, and embryogenesis. In complex with mannose-binding lectin-associated serine proteases (MASPs), collectin-11 may initiate the activation of complement, playing a role against pathogens, including T. cruzi. In this study, collectin-11 plasma levels and COLEC11 variants in exon 7 were assessed in a Brazilian cohort of 251 patients with chronic CD and 108 healthy controls. Gene-gene interactions between COLEC11 and MASP2 variants were analyzed. Collectin-11 levels were significantly decreased in CD patients compared to controls (p<0.0001). The allele rs7567833G, the genotypes rs7567833AG and rs7567833GG, and the COLEC11*GGC haplotype were related to T. cruzi infection and clinical progression towards symptomatic CD. COLEC11 and MASP2*CD risk genotypes were associated with cardiomyopathy (p = 0.014; OR 9.3, 95% CI 1.2–74) and with the cardiodigestive form of CD (p = 0.005; OR 15.2, 95% CI 1.7–137), suggesting that both loci act synergistically in immune modulation of the disease. The decreased levels of collectin-11 in CD patients may be associated with the disease process. The COLEC11 variant rs7567833G and also the COLEC11 and MASP2*CD risk genotype interaction were associated with the pathophysiology of CD.


Introduction
Chagas Disease (CD) is a neglected anthropozoonosis in which classically the primary infection with the protozoan Trypanosoma cruzi, transmitted by blood-sucking bugs, can occur during early childhood and may continue clinically silent for decades [1,2]. Approximately 30% of chronically infected individuals develop cardiac and/or digestive alterations, however, the majority of infected individuals remains asymptomatic [1,3]. In the last decades, increasing migration from endemic to non-endemic countries resulted in altered epidemiological scenarios, turning CD into a global health concern [4,5].
The complex pathophysiology of CD is influenced by several factors, in particular the parasite's genetic variability and the degree of host immune response, both playing a critical role in the disease outcome [3,6]. Parasite persistence is dependent on its ability to evade the host defense mechanisms. Here, host genetic background plays an important role in infection establishment and clinical presentation of CD [7][8][9].
The complement system is a central component of the innate immune response and one of the first line of defense against pathogens, in which carbohydrates or acetylated patterns on the pathogen surface are recognized by pattern recognition molecules (PRMs), such as lectins [10]. The initiators of the lectin pathway are ficolins (ficolin-1, ficolin-2 or ficolin-3) and collectins (mannose-binding lectin-MBL-and collectin-11 also known as collectin kidney 1-CL-11 alias CL-K1) [11]. Deficiency in the components of this pathway can critically impact the immune competence and therefore may lead to susceptibility to infectious diseases [12,13]. Moreover, the genetic variation on collectins may alter protein structure and, thereby affecting their ability to recognize parasites, including T. cruzi, contributing to parasite persistence. Indeed, genetic variation in PRMs of the lectin pathway has been associated with disease establishment and clinical progression of CD [13][14][15].
The infective T. cruzi metacyclic trypomastigote has a broad range of carbohydrates on its surface, including mannose, N-acetyl-D-glucosamine, galactose, and fucose on glycosylated proteins [16]. These glycoconjugates act as pathogen-associated molecular patterns (PAMPs), allowing the PRMs to interact with them [9,17]. Initially, the collectins associated with MBLassociated serine proteases (MASPs) bind to glycosylated molecules on the surface of T. cruzi in the presence of Ca 2+ activating the proteolytic cascade [18,19]. This cascade carries forward the activation of complement, which may also result in the elimination of pathogens [10,20].
Collectin-11 is a multifunctional soluble PRM lectin with important participation in host defense, homeostasis, and embryogenesis [19,21,22]. It is expressed by a wide range of tissues, with the adrenal glands, kidneys and liver being the sites of highest abundance [10,23]. Recently, collectin-11 has been found to circulate in the form of a heteromeric complex with collectin-10 [23]. Collectin-11 has binding affinity to sugars such as fucose, mannose, N-acetyl-D-galactosamine, and N-acetyl-D-glucosamine [21][22][23]. Similar to MBL, the monomer is composed of a collagen-like domain and a carbohydrate recognition domain (CRD), linked by a helical neck [10,24]. The gene encoding collectin-11, COLEC11, is located on chromosome 2p25.3 (OMIM 612502) and comprises 7 exons that transcribe the canonical protein [23]. COLEC11 variability was shown to interfere with expression and also with the binding of calcium and carbohydrates, possibly affecting protein folding [23]. Three distinct genetic variations in exon 7 of COLEC11 have been associated to the 3MC (Carnevale, Mingarelli, Malpuech, and Michels) developmental syndromes [25]. Two variations result in single amino acid substitutions, p.Ser169Pro and p.Gly204Ser, and the third in a deletion (p.Ser217del). All variations alter the primary structure of the CRD [24]. Homozygous individuals for the mutation p.Gly204Ser do not present detectable collectin-11 in serum [24]. Moreover, the variant p. His219Arg (rs7567833A>G) in exon 7 was associated with higher prevalence of urinary schistosomiasis [26].
Collectin-11 shows a strong binding affinity to fucose-proteins [27], as found in the Tc-85 protein family, expressed on the surface of T. cruzi metacyclic trypomastigotes. Those proteins are involved in the entry of the parasite to host cells [28]. In addition, collectin-11 is structurally similar to MBL and it has been shown that both MBL levels and MBL2 genetic variants were associated with disease susceptibility and pathophysiology of CD [29]. Considering these observations, collectin-11 plasma levels and COLEC11 variants in exon 7 were assessed to investigate their potential role in the chronic CD. Moreover, on account of the interaction between collectin-11 and MASPs for complement activation, gene-gene interaction between COLEC11 and MASP2 was assessed to evaluate the additive genetic effect of the two loci and their role in the pathophysiology of this chronic disease. ) and indirect immunofluorescent (IMUNO-Con Chagas, WAMA diagnóstica, São Paulo, Brazil (sensitivity and specificity 100%) assays. Clinical assessments were obtained through medical records and interviews, whereas patients younger than 18 years old, with recent infection, or suspected non-chagasic cardiomyopathy were excluded. Ancestry was self-referred by the patient in the first interview. Demographic and clinical characteristics of the distinct CD forms are shown in Table 1. Patients with cardiomyopathy were graded according to the cardiac insufficiency classification of the American Heart Association, adapted for CD [30]: A, altered electrocardiogram (ECG) and normal echocardiogram (ECHO), absence of cardiac insufficiency (CI); B1, altered ECG, left ventricular ejection fraction (LVEF) > 45%, absence of CI; B2, altered ECHO, LVEF < 45%, absence of CI; C, altered ECG and ECHO, compensable CI; D, altered ECG and ECHO, refractory CI. The digestive forms of Chagas disease were identified by alterations in esophagography and barium enema radiological exams, used to diagnose megaesophagus and/ or megacolon. Chronic asymptomatic individuals (with the indeterminate form) presented reactive serology and/or positive parasitological examination for T. cruzi but did not present clinical symptoms specific to CD and had normal results of ECG and radiological chest, esophagus and colon exams [30].

Study population
A total of 108 healthy Brazilians [mean age 51 years; 52 (48.1%) females, 56 (51.9%) males, 95 (88%) Euro-, 10 (9.3%) Afro-Brazilian, 2 (1.8%) Asian, 1 (0.9%) Amerindian] was used as control group. All individuals from the control group were selected consecutively from a blood bank in the same geographic region as patients with chronic CD. Following Brazilian health regulations, the blood donors were screened for CD, syphilis, hepatitis B, hepatitis C, HIV and human T-cell lymphotropic viruses 1 and 2 using high sensitivity assays. Additionally, self-referred ancestry and information about autoimmune diseases and cancer background was obtained during the pre-selection interview.

Ethics statement
The study protocol was approved by the local Ethics Committee (CEP/HC-UFPR n. 360.918/ 2013-08), and all adult patients and controls provided written informed consent on their behalf in accordance with the Declaration of Helsinki. No children were enrolled in this study.

Quantification of human collectin-11 plasma levels
Collectin-11 plasma levels were determined in 233 patients and 102 controls using a commercial high-sensitivity ELISA kit [Human Collectin-11 (COLEC11)/abx517452, Abbexa Ltd, Cambridge, UK] in accordance with the manufacturer's instructions. The limit of detection was 78 pg/ml. Plasma from 18 patients and six controls was not available. In total, 186 patients and 95 controls had overlapping samples between the genetic and ELISA analysis. Additionally, protein levels of C-reactive protein (CRP) [13,31], pentraxin 3 (PTX3) [31], MASP2 [14] and complement receptor 1 (CR1) [32] generated by previous studies in the same cohort were used for correlation analysis with collectin-11.

COLEC11 genotyping
In order to assess the distribution of the three COLEC11 variants in exon 7 (Fig 1), rs148786016G>A (g.3643816G>A, p.Gly172Ser), rs7567833A>G (g.364395A>G, p. His219Arg) and rs114716171C>T (g.3644079C>T, p.Thr259 = ), the entire COLEC11 exon 7 including its intron-exon boundaries was directly sequenced in 204 patients with chronic CD and 101 healthy control individuals. DNA from 47 patients and seven controls could not be isolated in sufficient amount; therefore, these individuals were excluded from further genetic analyses. Genomic DNA was extracted from buffy-coats using the QIAamp Blood mini kit (Qiagen GmbH, Hilden, Germany) following the manufacturer's instructions. The COLEC11 reference sequence (ENST00000349077.8) was retrieved from the Ensembl database (www. ensembl.org), primers targeting exon 7 of COLEC11 gene were those utilized by Antony et al. [6] and were synthesized commercially (Eurofins Genomics, Ebersberg, Germany). PCR amplifications were carried out in a 25 μl volume of reaction mixture containing 10x PCR buffer, 2 mM MgCl 2 , 0.125 mM of dNTPs, 0.2 μM of each primer, 1 unit of Taq polymerase (Qiagen, Germany), and 20 ng of genomic DNA on a Mastercycler Nexus Gradient (Eppendorf, Germany). Cycling parameters were initial denaturation at 94˚C for 3 minutes, followed by 35 cycles of denaturation at 94˚C for 30 seconds, annealing at 59.8˚C for 30 seconds and elongation at 72˚C for 1 minute, and a final elongation step at 72˚C for 10 minutes. PCR fragments were stained with SYBR Safe DNA Gel Stain (Invitrogen, Carlsbad, USA) and visualized in a 1.5% agarose gel. PCR products were purified using Exo-SAP-IT (USB-Affymetrix, Santa Clara, USA) and the purified products were directly used as templates for sequencing using the BigDye terminator cycle sequencing kit (v.3.1; Applied Biosystems, Texas, USA) on an ABI 3130XL DNA Analyzer. DNA polymorphisms were identified by assembling the sequences with the reference sequence of the COLEC11 (ENST00000349077.8) using the Geneious v11.0.3 software (Biomatters Ltd, Auckland, New Zealand) and reconfirmed visually from their respective electropherogram. Previously assessed MASP � CD genotypes from the same cohort group were retrieved from a previous study performed by our research group and utilized in the gene-gene interaction analysis [14].
In silico prediction of the biological consequence of rs7567833 (p.

His219Arg)
In silico analysis of possible functional effects of rs7567833 (p.His219Arg) on protein function/ structure was performed. The SIFT tool (Sorting Intolerant from Tolerant) is a multi-step sequence alignment comparison algorithm, which infers whether an amino acid substitution may have an impact on protein function considering the premise that highly conserved amino acids are more intolerant to substitution than those less conserved (http://sift.bii.a-star.edu.sg/ ) [33]. PolyPhen-2 utilizes a trained Naive Bayes classifier to evaluate physical and comparative considerations to predict the functional significance of a mutation on the structure and function of a protein (http://genetics.bwh.harvard.edu/pph2/) [34]. Ensembl Variant Effect Predictor (VEP) infers the effect of variants on protein sequence using SIFT and PolyPhen-2 predictions in the extensive collection of genomic annotation of Ensembl database (http:// www.ensembl.org/vep) [34][35][36]. SNAP2 is a neural network-based classifier that utilizes a backpropagation algorithm resulting in predictions regarding the functionality of mutated proteins (https://www.rostlab.org/services/snap/) [37][38][39]. Combined annotation dependent depletion (CADD) is a tool for scoring the deleteriousness of single nucleotide variants in the human genome (https://cadd.gs.washington.edu/snv) [40].

Statistical analysis
Collectin-11 plasma levels were tested for normality using Shapiro-Wilk and compared between groups using nonparametric Kruskal-Wallis and Mann-Whitney tests using Graph-Pad Prism software (version 5), with dispersion graphics displaying median and percentiles values. For all the analysis, CD patients were compared among the clinical forms as indeterminate/asymptomatic, cardiac (A+B1/2+C+D groups), digestive, and cardiodigestive, and also grouped as symptomatic patients (cardiac + digestive + cardiodigestive forms). Also, patients with cardiac form were grouped as with cardiomyopathy (B2+C+D), without ECHO alterations (A), with ECHO alterations (B1/2+C+D), without heart failure (A+B1/2) and with heart failure (C+D). Multiple logistic regression was executed in a multivariate model using a

Association of COLEC11 genetic variants with Chagas disease
The distribution of COLEC11 genotypes has not violated Hardy-Weinberg equilibrium in both control (rs148786016, not applicable-monomorphic locus; rs7567833, p = 0.38; rs114716171, p = 1.00) and patient (rs148786016, p = 1.00; rs7567833, p = 0.16; rs114716171, p = 1.00) groups, as well as in the asymptomatic group (rs148786016, not applicable-monomorphic locus, rs7567833, p = 0.67; rs114716171, p = 1.00). No association was found between the analyzed genetic variants and collectin-11 plasma levels. The frequency of COLEC11 variant rs7567833G (p = 0.005; OR 2.3, 95% CI 1.2-4.2) was significantly higher in chronic CD patients. It also occurred more frequently among patients with the cardiodigestive form (p = 0.002; OR 3.9, 95% CI 1.7-8.8), compared to controls (Table 2). Also, the frequencies of COLEC11 genotypes AG and GG of rs7567833 were significantly higher in chronic CD patients (p = 0.028; OR 2.2, 95% CI 1.1-4.4) than in controls (Table 2). In addition, carriers of the G allele (AG and GG of rs7567833) were rather present among patients presenting the cardiodigestive form of CD (p = 0.002, OR 5.1, 95%CI 1.9-14.2) in relation to controls ( Table 2). No significant difference was found between the allelic and genotypic frequencies of controls and patients for COLEC11 variants rs148786016 and rs114716171 ( Table 2). The G allele (p = 0.006, OR 2.5, 95% CI 1.3-4.8) and the genotypes AG and GG of rs7567833 (p = 0.023, OR 2.5, 95% CI 1.1-5.7) were more frequent in patients with cardiomyopathy than in controls (Table 3). Considering the different stages of cardiac pathology, the G allele and AG and GG genotypes of rs7567833 were significantly higher in patients with cardiomyopathy with ECHO alteration (p = 0.01, OR 2.5, 95% CI 1.2-4.9; and p = 0.03, OR 2.5, 95% CI 1.1-5.9, respectively) in comparison to controls. In addition, the minor allele G and carriers of the G allele (AG and GG of rs7567833) were more frequent among patients with heart failure than in controls, although not statistically significant after logistic regression. No association was found when analyzing patients presenting only pathology of the digestive tract (Table 3).
In silico analysis predicted that the non-synonymous variant rs7567833A>G might have a functional impact on collectin-11 (SNAP2; score 69) with a likely deleterious effect on protein function (CADD, score 20.5). However, this variant was predicted to present a tolerated effect on protein function by the SIFT tool, being considered benign regarding the structure and function of the protein by PolyPhen-2.

Gene-gene interaction of COLEC11 and MASP2 variants in Chagas disease
Considering the biological relevance between collectin-11 and MASP2 and that MASP2 genetic variants were associated with high risk of cardiomyopathy in chronic CD [14], the genetic interaction between COLEC11 and MASP2 variants were analyzed. For this, the combined effect of cardiac commitment risk genotypes for COLEC11 (rs7567833AG and rs7567833GG) and MASP2 (MASP2 � CD carriers, g.1961795C>A, p.D371Y) (S1 Table) in chronic chagasic cardiomyopathy was calculated. The frequency of the risk genotypes in both loci (COLEC11 AG+GG and MASP2 � CD + carriers) was higher in patients with cardiodigestive  Association of collectin-11 with the pathophysiology of Chagas Disease form (21%) and cardiomyopathy (13%), than healthy controls (2%), (p = 0.005, OR 15.2, 95% CI 1.7-137; p = 0.014, OR 9.3, 95% CI 1.2-74, respectively) ( Table 6). As recommended for gene-gene interaction in case-control association studies, a dimension reduction method (MB-MDR) was applied to check the association of COLEC11 AG+GG and MASP2 � CD genotypes with a risk phenotype for CD. With this approach, COLEC11 and MASP2 risk genotypes presented high risk interaction for CD, which remained significant even after adjustment (considering 100 permutations) for patients with cardiomyopathy when compared to controls (adjusted permutation p = 0.05) and for patients with cardiodigestive form compared to asymptomatic but infected individuals (adjusted permutation p = 0.04).  Association of collectin-11 with the pathophysiology of Chagas Disease

Discussion
Pathogen recognition is a critical step in host defense against pathogens. The lectin pathway activates the complement system based on the recognition of surface microbial carbohydrate patterns by PRM such as collectin-11. This recognition can lead to pathogen lysis through the membrane attack complex formation and may support the control of the parasite burden [46]. Previous reports have shown that the PRMs MBL, ficolins, and collectin-11 can recognize and bind to specific glycoproteins on the surface of pathogens, including T. cruzi [47][48][49]. Association studies have also demonstrated that the lectin proteins ficolin-2 [13] and MBL [15] are involved in disease progression of chronic CD, however, these results were not yet tested in vitro or in vivo experimental models. In this study, individuals chronically infected with T. cruzi presented decreased levels of collectin-11 compared to healthy controls, however, this was not associated with the genetic variants analyzed in this study. It is important to mention that other causal variants responsible for modulating COLEC11 expression were not investigated in this study, such as rs13417396 (in intron 4), rs11895384 (intron 5), rs10185914 (intron 6), and rs10166336 (intron 6) (https://ldlink.nci.nih. gov/). Polymorphisms in the promoter region do not appear to play a role in collectin-11 expression [26,50]. Alternatively, the lower collectin-11 levels found in patients may be due to consumption of collectin-11 during T. cruzi chronic infection. Moreover, no difference in protein levels was found between both groups indeterminate/asymptomatic and symptomatic patients, indicating that the different CD phenotypes are not directly induced by collectin-11. However, this lack of difference may be due to the limited number of patients per clinical group and/or the difficulty to detect minimal changes in asymptomatic patients using conventional medical examinations.
Lower levels of collectin-11 have been associated with other infectious disease including Schistosoma haematobium infection [26] and tuberculosis [51]. In line with recent studies, collectin-11 plasma levels presented no correlation with CRP and PTX3 levels, reinforcing that it is not an acute phase protein [50]. The weak negative correlation of collectin-11 levels with LVEF (r = -0.15, p = 0.0419) may indicate that low levels of the protein could be associated with an increased risk of cardiac commitment in patients with chronic CD. Nevertheless, additional studies are necessary to confirm this hypothesis. The positive association of AG and GG genotypes and the G allele in variant rs7567833 observed in patients with chronic CD may be related to the functional properties of the collectin-11 molecule. Interestingly, G (the minor allele) is indeed the ancestral allele [52] and its reduction indicates that this polymorphism may have experienced selection pressures over the time [53]. Although the genetic drift resulting from human migration may be an alternative explanation. This variant (rs7567833A>G) results in an amino acid change (p.His219Arg) in the carbohydrate recognition domain of the protein which probably interferes with its binding affinity to carbohydrates and thereby alters the potential of collectin-11 to activate the lectin pathway (Fig 1) [24,52]. Indeed, collectin-11 p.His219Arg (rs7567833A>G) was predicted by in silico analysis to have a functional impact with a likely deleterious effect on protein function (SNAP2, CADD). Nevertheless, p.His219Arg did not affect collectin-11 plasma levels either in CD patients or controls, which is in agreement with the finding of Bayarri-Olmos and collaborators [52].
As seen for ficolin-2, amino acid substitutions in the pathogen recognition domain could affect the binding affinity of the variant molecule towards its ligand and thus the complement activation potential [52]. Two non-synonymous polymorphisms in FCN2 positioned near the binding site markedly alter its binding capacity [54]. Interestingly, the substitution FCN2 � 258S affecting the binding affinity of ficolin-2 was associated with the development of the cardiodigestive form in chronic CD [13]. This was also observed for the COLEC11 variant rs7567833A>G, where the G allele, the carriers of G allele (AG and GG genotypes) as well as the COLEC11 � GGC haplotype were associated with cardiodigestive form of CD, indicating that this variant might predispose to clinical progression of chronic CD. Additionally, in a study that evaluated another C-type collectin, alleles causing MBL deficiency were associated with clinical progression of CD and MBL2 genotypes causing MBL deficiency were associated with heart damage [29]. Also, the minor allele G (rs7567833G), its genotypes (rs7567833AG and rs7567833GG) and COLEC11 � GGC haplotype were associated with cardiomyopathy. Here the analyzed COLEC11 genetic variant does not lead to protein deficiency, but it may alter protein function, being associated with the development of infection and pathophysiology of CD. Nevertheless, functional studies on both p.219His and p.219Arg collectin-11 conformations must be performed in order to define their effect type on the interaction of collectin-11 to its ligands.
It is known that collectin-11 binds to PAMPs and activates MASP-2 to initiate the activation of lectin pathway, stimulating immune processes [20]. Here, the results indicated that COLEC11 (rs7567833G>A) and the diplotype MASP2 � CD (g.1961795C>A, p.D371Y) presented gene-gene interaction. Patients carrying both risk genotypes were shown to have a 15.2-fold increased risk of developing cardiodigestive form of CD and a 9.3-fold increased risk of cardiomyopathy. This additive or synergic interaction may contribute to the immune modulation of the disease. Nevertheless, the increased risk of developing the cardiodigestive form should be interpreted carefully due to the low sample size in this study. Analysis of a larger population would be required to confirm the role of this genetic interaction. The mechanisms by which these two genes interact with each other in the pathophysiology of CD is not clear; but interplay of both proteins, collectin-11 and MASP-2, occurs during activation of the lectin pathway. In addition, previously, results showing that MASP2 � CD genotypes are associated with high risk of CD cardiomyopathy [g.1961795C, p.371D diplotype was more frequent in symptomatic patients (p = 0.012, OR 3.11) as well as in patients with cardiomyopathy (p = 0.012, OR 13.53) compared to asymptomatic patients] [14], corroborates these results. This is the first study analyzing the impact of gene-gene interaction in markers of innate immunity in CD. The combined genetic analysis used in this study may provide further insight into the complex pathogenesis of this disease.
The low number of patients in some groups, especially those with the cardiodigestive form, presents a limitation for this study and is partly due to the unequal distribution and stratification of the patients according to the different clinical forms. This may affect the statistical power by reducing it (<70%) (S2 Table), requiring careful interpretation of the results for the clinical forms, especially the cardiodigestive form. For these reasons, more studies, including analysis of a larger population and functional approaches, are necessary to understand better the role of collectin-11 in the pathophysiology of CD. In addition, the ancestry was selfreferred by the participant/patient, which result in bias regarding the ancestry data. Nevertheless, the fact that the same results were reproduced in different comparisons, leads us to suggest that the associations are indeed reliable.
In conclusion, this study reports that the analyzed COLEC11 variants and collectin-11 levels are associated with T. cruzi infection. Nevertheless, the decreased collectin-11 levels were not associated with the studied polymorphisms and may be related to the disease process. COLEC11 rs7567833G and MASP2 � CD risk genotype may act synergistically increasing the risk of developing chagasic cardiomyopathy. This pioneering study provides insights on the role of collectin-11 and also on combinational genetic analysis (COLEC11 and MASP2) of two initiators of the complement response in the clinical presentation of chronic CD. Future functional studies are required to unveil the interaction of collectin-11 with T. cruzi as well as to investigate the additive/synergic effect of COLEC11 and MASP2 genes in the development and clinical expression of CD.
Supporting information S1