CHIT1 at diagnosis predicts faster disability progression and reflects early microglial activation in multiple sclerosis

Multiple sclerosis (MS) is characterized by heterogeneity in disease course and prediction of long-term outcome remains a major challenge. Here, we investigate five myeloid markers – CHIT1, CHI3L1, sTREM2, GPNMB and CCL18 – in the cerebrospinal fluid (CSF) at diagnostic lumbar puncture in a longitudinal cohort of 192 MS patients. Through mixed-effects and machine learning models, we show that CHIT1 is a robust predictor for faster disability progression. Integrative analysis of 11 CSF and 26 central nervous system (CNS) parenchyma single-cell/nucleus RNA sequencing samples reveals CHIT1 to be predominantly expressed by microglia located in active MS lesions and enriched for lipid metabolism pathways. Furthermore, we find CHIT1 expression to accompany the transition from a homeostatic towards a more activated, MS-associated cell state in microglia. Neuropathological evaluation in post-mortem tissue from 12 MS patients confirms CHIT1 production by lipid-laden phagocytes in actively demyelinating lesions, already in early disease stages. Altogether, we provide a rationale for CHIT1 as an early biomarker for faster disability progression in MS.

For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of of all covariates tested
A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g.means) or or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g.Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in published literature, software must be be made available to to editors and reviewers.We We strongly encourage code deposition in in a community repository (e.g.GitHub).See the Nature Portfolio guidelines for submitting code & software for further information.

Bénédicte Dubois May 21, 2024
All previously published CNS datasets were downloaded as as FASTQ files from the Sequence Read Archive (SRA) under BioProject accession numbers PRJNA544731, PRJNA749443, PRJNA743676 and PRJNA726991.SRA-Toolkit version 3.0.0was used.

Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Data exclusions
All data that support the main findings in this study are available in the manuscript or the supplementary materials.The previously published CNS scRNA-seq and snRNA-seq datasets were downloaded from the Sequence Read Archive (SRA) under BioProject accession numbers PRJNA544731, PRJNA749443, PRJNA743676 and PRJNA726991.Our previously unpublished CSF scRNA-seq cohort has been uploaded to the SRA under BioProject accession number PRJNA996357.Raw biomarker measurements, processed sc/snRNA-seq data (e.g.differential gene expression), technical sc/snRNA-seq information and details of the included patients are provided in Supplementary Data 1-6.
The term "sex" was used to indicate a biological attribute.Information about sex was collected from medical records (as on the identity cards).Since sex may influence MS severity, we included sex as a covariate in the biomarker analysis (single-and multi-time-point analyses and machine learning models) as described in the methods, results and/or legends.However, this study was not designed with the aim to demonstrate differences between sexes.Disaggregated data based on sex is not available in the source data or supplementary materials.
Since ethnicity may influence MS severity, ethnicity was incorporated among many other variables as a covariate in the machine learning models.Ethnicity was collected from medical records.Ethnicity was not used as a proxy for other variables.This study was not designed with the aim to demonstrate differences between ethnicities.
Age at diagnosis (LP), sex and the rs150192398 genotype for CHIT1 were included as subject-relevant covariates in the single- With regard to the biomarker analysis, samples with a coefficient of variance more than 20% or CSF biomarker concentrations below the detection limit were excluded.These exclusion criteria were pre-established.Samples above the upper detection limit were remeasured using a different dilution factor where possible or were eliminated otherwise.For the sc/snRNA-seq analysis, data was excluded as described in the quality control paragraph of the methods.These thresholds for quality control are in line with the current literature for single-cell/nucleus transcriptomics.

nature portfolio | reporting summary
April 2023

Blinding
Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Outcomes
With regard to the biomarker analysis, CSF protein concentrations were measured in duplicate and blinded to clinical data.Mean values across duplicates were used for analysis.Machine learning models, independent of single-and multi-time-point analyses, replicated that CHIT1 is the most robust predictor for disability progression.For the sc/snRNA-seq analysis, the unsupervised/unbiased sc/snRNA-seq technique was applied to generate an initial hypothesis.Afterwards, the most important results were replicated at the protein level with neuropathological evaluation on post-mortem brain samples.
We allocated patients to the MS group in accordance with the 2017 revised McDonald criteria.
CSF protein concentrations were measured in duplicate and blinded to clinical data.We performed the single-and multi-time-point analyses and machine learning models blinded for disability progression.EDSS scores were used as input for the models, but a distinction between fast and slow progressors was never predefined.
All antibodies used are commercially available and validated by the manufacturer.
No clinical trial.
No clinical trial.
No clinical trial.
No clinical trial.

nature portfolio | reporting summary
April 2023

Novel plant genotypes
Seed stocks

Plants
Describe the methods by which all novel plant genotypes were produced.This includes those generated by transgenic approaches, gene editing, chemical/radiation-based mutagenesis and hybridization.For transgenic lines, describe the transformation method, the number of independent lines analyzed and the generation upon which experiments were performed.For gene-edited lines, describe the editor used, the endogenous sequence targeted for editing, the targeting guide RNA sequence (if applicable) and how the editor was applied.was applied.was applied.
Report on the source of all seed stocks or other plant material used.If applicable, state the seed stock centre and catalogue number.If plant specimens were collected from the field, describe the collection location, date and sampling procedures.
Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to assess the effect of a mutation and, where applicable, how potential secondary effects (e.g.second site T-DNA insertions, mosiacism, off-target gene editing) were examined.
was created with GraphPad Prism v10.0.0.A custom machine learning model as as described in in D'hondt et et al. (Artificial Intelligence in in Medicine., 2023) was developed Python 3.8.10.The code scripts to to recreate the main figures and results are accessible in in Zenodo via GitHub under the DOI 10.5281/zenodo.11235175[https://zenodo.org/doi/10.5281/zenodo.11235175].No No unpublished, standalone software/tool/ program/package was built.Reporting on sex and gender Reporting on race, ethnicity, or other socially

Materials
clinical studiesAll manuscripts should comply with the ICMJEguidelines for publication of clinical research and a completedCONSORT checklist must be included with all submissions.