Microbiome dataset of eukaryotic and fungal communities in the bulk soil and root of wild Brassica napus in South Korea

This article describes the dataset of the eukaryotic and fungal microbiome in bulk soil and root of wild Brassica napus at five different grassland sites in South Korea. The microbiome datasets were obtained using Illumina MiSeq sequencing of the 18S rRNA gene and ITS1 gene. The raw sequences and metadata used for analysis are available at the National Center for Biotechnology Information (NCBI) (BioProject ID: PRJNA821335). Raw data were clustered into amplicon sequence variants (ASVs) using the DADA2 pipeline and aligned against the SILVA 132 reference database and UNITE database. A total of 5702 eukaryotic ASVs (1,913,372 reads) and 4565 fungal ASVs (9,032,969 reads) were extracted after quality-filtering. Rhizaria was the most dominant eukaryote at the class level, and Olpidiomycetes was the dominant fungal class in this dataset. As unintended releases of transgenic B. napus have been reported in South Korea [1], the microbiome datasets produced in this work will be used as the foundation for environmental risk assessment to understand the potential effect of released transgenic B. napus on the natural ecosystem.


Value of the Data
• These eukaryotic and fungal microbiome datasets can be used for understanding microbial dynamics in the rhizosphere of wild Brassica napus grown in the natural ecosystem. • These data are valuable for understanding the co-occurrence patterns and interactions among eukaryotes and fungus in the rhizosphere. • Crop and environmental scientists can use the datasets for potential environmental risk assessments of transgenic B. napus .

Data Description
The data in this dataset describe the taxonomic profiles of bulk soil and root samples of wild Brassica napus from five different grassland sites in South Korea. A total of 199 samples were collected from the bulk soil and root of B. napus . Amplicon libraries were constructed for eukaryotic and fungal communities by MiSeq sequencing. A total of 5702 eukaryotic amplicon sequence variants (ASVs; 1,913,372 reads) and 4565 fungal ASVs (9,032,969 reads) were extracted after quality-and chimera-filtering, as described in the Material and Methods section. The raw pair-end FASTQ and metadata files are deposited in the NCBI SRA database under the BioProject ID PRJNA821335 (.fastq format). Metadata file provides the following information about samples: primer set, isolation source, date of sample collection, sampling sites, and technical batch of sequencing. Processed ASV tables and taxonomic assignments are available at Mendeley Data with the DOI shown in the Specifications table. The rarefaction curves of each sample are shown in Fig. 1 , which supported the depth of sequencing for further analysis. Fig. 2 displays the relative abundance of eukaryotic and fungal communities at the class level. Rhizaria (36.7% ± 21.7%) was the most dominant class ( Fig. 2 A), followed by Holozoa (35.2% ± 23.2%) and Stramenopiles (13.8% ± 16.3%). The relative abundance of Alveolata was approximately five times higher in bulk soil samples than in root samples. At the ASV level, the most abundant eukary-  otic ASV was assigned to the order Haplotaxida ( Table 1 ). For fungal community, Olpidiomycetes (32.3% ± 35.5%), Sordariomycetes (19.7% ± 14.5%), and Dothideomycetes (16.3% ± 14.1%) were the dominant groups ( Fig. 2 B). The relative abundance of major eukaryotic and fungal ASVs are given in Table 1 . At the ASV level, the most abundant eukaryotic and fungal ASV were assigned to the genus Haplotaxida and Olpidium ( Table 1 ).  . Sampling sites were selected to include the natural habitats of B. napus with diverse plant species that had experienced low levels of disturbance by humans. Plants at the flowering stage and of similar size were selected. After digging up each plant with an ethanol-sterilized shovel to minimize root damage, sampling was conducted for bulk soil and root. Bulk soil samples were collected from the soil that fell off the plant following light shaking, and the parts that did not contain plant debris and root were gathered. After collecting the bulk soil samples, the plant was vigorously shaken to remove loosely bound soil, and the roots and tightly bound soil were collected together. The shovel, forceps, and blades were cleaned with 70% ethanol and washed with sterile water between the handling of each sample to minimize contamination. The samples were stored at −80 °C until DNA extraction.

Bioinformatic Analysis
To explore the ASV profiles of eukaryotic and fungal communities, the ASVs of the 18S rRNA gene and ITS gene were calculated using DADA2 (version 1.16), according to the pipeline workflow 1.16 and 1.8 for the 18S rRNA gene and ITS gene, respectively (accessed date: March 2022, https://benjjneb.github.io/dada2/tutorial.html and https://benjjneb.github.io/dada2/ITS _ workflow. html ) in R [5] . In detail, filtering was performed with the DADA2 s 'filterAndTrim' command with the following settings for the 18S rRNA gene data set: truncLen = c(250,220), trimLeft = c(16,17), maxN = 0, maxEE = c(2,2), truncQ = 2, rm.phix = TRUE. Chimeric ASVs were removed with the method 'consensus' by using 'removeBimeraDenovo' command. The DADA2 formatted Silva database (release 132) was used to align and classify the sequences of the 18S rRNA gene [6] . For the ITS1 gene, filtering was performed with the DADA2 s 'filterAndTrim' command with the following settings: minLen = 50, maxN = 0, maxEE = c(2,2), truncQ = 2, rm.phix = TRUE. The UNITE database (UNITE general FASTA release for Fungi 2. Version 10.05.2021.) was used to align and classify the sequences [7] . Subsequently, any reads assigned as chloroplast and fungal sequences were removed in the 18S rRNA dataset and chloroplast sequences in the ITS dataset. ASVs that comprised only singletons, doubletons, and tripletons were not further analyzed. Moreover, ASVs that appeared in at least two samples were used for further analysis. Rarefaction curves were constructed by using 'rarecurve' function from the Vegan package [8] .

Ethics Statements
The work did not involve human subjects, animals, cell lines, or endangered species of wild fauna and flora.

Declaration of Competing Interest
The author declares that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Data Availability
Raw data for ITS and 18S Metagenomics of wild Brassica napus (Original data) (Mendeley Data).
Eukaryotic and fungal community of wild Brassica napus in South Korea (Original data) (NCBI SRA-PRJNA821335).