2-Furoic acid associated with the infection of nematodes by Dactylellina haptotyla and its biocontrol potential on plant root-knot nematodes

ABSTRACT Dactylellina haptotyla is a typical nematode-trapping fungus that has garnered the attention of many scholars for its highly effective lethal potential for nematodes. Secondary metabolites play an important role in D. haptotyla-nematode interactions, but which metabolites perform which function remains unclear. We report the metabolic functions based on high-quality, chromosome-level genome assembly of wild D. haptotyla YMF1.03409. The results indicate that a large variety of secondary metabolites and their biosynthetic genes were significantly upregulated during the nematode-trapping stage. In parallel, we identified that 2-furoic acid was specifically produced during nematode trapping by D. haptotyla YMF1.03409 and isolated it from fermentation production. 2-Furoic acid demonstrated strong nematicidal activity with an LD50 value of 55.05 µg/mL against Meloidogyne incognita at 48 h. Furthermore, the pot experiment showed that the number of galls of tomato root was significantly reduced in the experimental group treated with 2-furoic acid. The considerable increase in the 2-furoic acid content during the infection process and its virulent nematicidal activity revealed an essential synergistic effect during the process of nematode-trapping fungal infection. IMPORTANCE Dactylellina haptotyla have significant application potential in nematode biocontrol. In this study, we determined the chromosome-level genome sequence of D. haptotyla YMF1.03409 by long-read sequencing technology. Comparative genomic analysis identified a series of pathogenesis-related genes and revealed significant gene family contraction events during the evolution of D. haptotyla YMF1.03409. Combining transcriptomic and metabolomic data as well as in vitro activity test results, a compound with important application potential in nematode biocontrol, 2-furoic acid, was identified. Our result expanded the genetic resource of D. haptotyla and identified a previously unreported nematicidal small molecule, which provides new options for the development of plant biocontrol agents.

To assess the identity between our new assembly and previous published scaffold-level MHA_v2 assembly, we used NUCMER to perform whole-genome alignment.The identity dot figure was painted by mummerplot function in MUMmer.

Methods S3: Gene and genomic components prediction
The gene prediction was performed by MAKER-2.31.10.Their basic functional annotations were obtained by BLAST and Blast2Go.Besides, we also used BLAST to annotate genes on TCDB and PHI databases.CAZy annotation were obtained by hmmer.
We built a repetitive sequence database of the genome assembly based on the principles of structural prediction and de novo prediction, which was implemented by LTR_FINDER v1.05, MITE-Hunter, RepeatScout v1.0.5, and PILER-DF v2.4.The database was then classified with PASTEClassifier, and then cooperated with Repbase to generate the final repetitive database.
Then the RepeatMasker v4.0.6 software is used to make repeat predictions based on the built repeating repetitive sequence database.
Methods S4: Comparative genomics analysis Gene family clustering was first performed based on all amino acid sequences of the selected species using orthofinder v2.3.12 software (Emms & Kelly, 2019), where MUSCLE v3.8.1551 software was used to do align protein sequences.Using perl scripts to count and plot Venn diagrams.GO (Ashburner et al., 2000) and KEGG (Kanehisa, 2000) analysis were performed using the R package clusterProfiler (Wu et al., 2021) based on shared gene and functional annotations.Multiple sequence alignment was performed for each single-copy gene family using the software muscle (version: 3.8.31)(Edgar, 2004) 3) software was used to estimate the number of gene family members for each branch's ancestor using a birth-mortality model based on species evolutionary trees and gene family clustering results to predict the contraction and expansion of the species' gene family relative to the ancestor.
Methods S5: RNA-seq analysis After passing the assay, total RNA samples were enriched with magnetic beads with Oligo (dT) for eukaryotic mRNA, fragmentation buffer was added to fragment the mRNA, and the fragmented mRNA was used as a template to synthesize the first strand of cDNA with six-base random primers, then added buffer, dNTPs (A, U, G, C), RNase H and DNA polymerase I to synthesize cDNA second strand.The double-stranded cDNA was then purified by magnetic beads and eluted with EB buffer, and the eluted double-stranded cDNA was processed for end repair, base A addition and sequencing junction addition.The entire library preparation was then completed by using magnetic beads for fragment size selection, degradation of U-containing chains, and PCR amplification.
To investigate the expression landscape more comprehensively, we re-predicted the gene with an ensemble method.Genscan, Augustus v2.4,GlimmerHMM v3.0.4,GeneID v1.4, and SNAP were used to perform ab initio gene prediction.GeMoMa was used to perform homologous protein-based gene prediction.The results were integrated by EVM v1.1.1. 11,073 genes were finally identified.
RNA-seq libraries were sequenced by Illumina HiSeq systems.The reads length was pairend 150 bp.Transcriptome quantification was performed by Salmon with default parameters.
The R package DESeq2 was used for differential expressed genes (DEGs) analysis.The DEGs cuf-off was set as p-value < 0.05 and absolute log (Fold Change) > 0.5.R package mfuzz was used for time-series gene expression trend analysis.
Methods S6: Metabolomic data acquisition and statistical analysis Untargeted LC-MS metabolomics was performed on a Dionex UltiMate 3000 LC system coupled with a Q-Exactive Orbitrap mass spectrometer (San Jose, CA) (Thermo, Bremen, Germany).All samples were separated on a XTerra ® MS C18 (150 mm × 3.9 mm, Waters) with a particle size of 5 µm at an LC flow rate of 1.0 ml/min with a flow split ratio of 0. Compound Discoverer (CD version 3.3, Thermo Fisher Scientific) software was used for metabolomics data analysis of raw data file.Three blank samples were used for background subtraction and noise removal during the pre-processing step.The data were analyzed in six groups (P24, P48, D24, D48, PD24, and PD48).The QC samples were used to normalize each individual compound and compensate for instrumental drift.For analysis of the data on metabolite variation in the six groups, simple univariate statistical analyses were carried out on log 2 -transformed data using a paired t-test.Volcano plots were created using these data, with a threshold of p < 0.01 and absolute log 2 fold-change of > 1 set for defining a notable change in compound abundance between PD48 and D48 group.All components were searched and compounds are annotated following the references (Hu et al., 2022;Qu et al., 2023).

Figure S1 .
Figure S1.Infection status at different time points after the addition of nematodes.(A) Adhesive knobs in control group at 12 h after nematode addition.(B) Adhesive knobs in reciprocal group at 12 h after nematode addition.(C) Adhesive knobs in control group at 24 h after nematode addition.(D) Adhesive knobs in reciprocal group at 24 h after nematode addition.(E) Adhesive knobs in control group at 48 h after nematode addition.(F) Adhesive knobs in reciprocal group at 48 h after nematode addition.The adhesive knobs in the diagram are indicated by the red arrow.

Figure S3 .
Figure S3.Divergence time of species.Inner nodes are labeled with the reference differentiation time, and the confidential intervals of differentiation time are in parentheses.The branch length indicates the length of divergence time.

Figure S5 .
Figure S5.GSVA-estimated global expression level of P450 and PHI gene sets."Total" represents the sum set of above 2 sets.

Figure S6 .
Figure S6.(A) Expression dynamic trajectories of DEGs fuzzy-clusters.There exist four consistent groups.(B) The KEGG pathway enrichment matrix of genes from four groups.

Figure S7 .
Figure S7.The volcano plot of PD24 versus D24 group.Significance cutoffs were p = 0.01 (Bayes moderated t-tests) and FC = 1.Each dot represents an individual compound (within ±10 ppm in mass), and the probability of that quantitative observation being statistically significant is indicated by a p value on the y-axis (determined using the standard linear model within SIEVE software).The 71 compounds on the right half of the plot are present at significantly higher levels in the PD24 samples.The 67 compounds on the left-hand portion of the diagram are present at significantly higher levels in the D24 samples.
Methods S2: Genome sequencing and assemblyDNA extraction was performed using the CTAB binding QIAGEN Genomic-tip 20/G and its quality was estimated by Qubit system.7.23 Gbp raw data were obtained by Nanopore longread sequencing platform (Oxford Nanopore Technologies, UK).An in-house script was used to trim the raw sequencing data and 6.98 Gbp clean data were retained.Besides, 4.49 Gbp shortread data was obtained by Illumina HiSeq 4000 system (Illumina, USA).Hi-C libraries were created from ~2 million D. haptotyla YMF 1.03409 cells.These cells were cross-linked and digested by HindIII, the restriction enzyme.The DNA fragment sticky ends were then biotinylated and ligated into chimeric circles.After enrichment, the chimeric circle was sheared and processed into sequencing libraries.The libraries were sequenced by Illumina HiSeq X Ten (Illumina).After quality-check and trim procedures, 7.41 Gbp clean Hi-C data was retained.Canu v1.5 was used to assembly the clean Nanopore reads into primary contigs.Pilon was used to polish the contigs by NGS short reads.Hi-C data were then mapped to the polished contigs by bwa.LACHESIS was used to generate the final D. haptotyla YMF 1.03409 assembly.
3 to the ionization source, and a column temperature of 40°C.Mobile phase A was 0.1% formic acid in water, and mobile phase B was 0.1% formic acid in methanol.The 30 min gradient for positive ESI mode was set as follows: 0-5 min, 2% solvent B; 5-23 min, 2-99% solvent B; 23-28 min, 99% solvent B; and 28.1-35 min, 2% solvent B. The injection volume was 10 μl, and each sample was injected in triplicates.The LC-MS instrument was controlled using Thermo Scientific Xcalibur 4.4 software.

Table S1
Lengths of D. haptotyla YMF 1.03409 chromosomes Table S2 Statistics of repeat elements in D. haptotyla YMF 1.03409 genome and MHA_v2 TableS13Twenty-eight known compounds confidently annotated in PD48 vs D48 experiments Methods S1: Mycelium sample prepared for genome sequencing Preparation of spore suspension: D. haptotyla YMF1.03409 was inoculated on PDA medium at 28 °C for 20 days.Subsequently, the spores were washed off the surface of the mycelium with sterile water on an ultra-clean table to obtain a spore suspension.2) Acquisition of vegetative hyphae: the spore solution was added to potato dextrose broth (PDB) medium and incubated at 28 °C for 14 days at 180 rpm to obtain vegetative hyphae.The vegetative hyphae were separated from the fermentation broth by filtration, and the mycelium was rinsed thrice with double distilled H 2 O. Subsequently, the vegetative hyphae were drained as much as possible with filter paper sheets, and finally, the mycelium was snap-frozen with liquid nitrogen and stored at -80 °C in a refrigerator.