Illumina sequencing of the chloroplast genome of common ragweed (Ambrosia artemisiifolia L.)

Common ragweed (Ambrosia artemisiifolia L.) is the most widespread weed and the most dangerous pollen allergenic plant in large areas of the temperate zone. Since herbicides like PSI and PSII inhibitors have their target genes in the chloroplast genome, understanding the chloroplast genome may indirectly support the exploration of herbicide resistance and development of novel control methods. The aim of the present study was to sequence and reconstruct for the chloroplast genome of A. artemisiifolia and establish a molecular dataset. We used an Illumina MiSeq protocol to sequence the chloroplast genome of isolated intact organelles of ragweed plants grown in our experimental garden. The assembled chloroplast genome was found to be 152,215 bp (GC: 37.6%) in a quadripartite structure, where 80 protein coding genes, 30 tRNA and 4 rRNA genes were annotated in total. We also report the complete sequence of 114 genes encoded in A. artemisiifolia chloroplast genome supported by both MIRA and Velvet de novo assemblers and ordered to Helianthus annuus L. using the Geneious software.


a b s t r a c t
Common ragweed (Ambrosia artemisiifolia L.) is the most widespread weed and the most dangerous pollen allergenic plant in large areas of the temperate zone. Since herbicides like PSI and PSII inhibitors have their target genes in the chloroplast genome, understanding the chloroplast genome may indirectly support the exploration of herbicide resistance and development of novel control methods. The aim of the present study was to sequence and reconstruct for the chloroplast genome of A. artemisiifolia and establish a molecular dataset. We used an Illumina MiSeq protocol to sequence the chloroplast genome of isolated intact organelles of ragweed plants grown in our experimental garden. The assembled chloroplast genome was found to be 152,215 bp (GC: 37.6%) in a quadripartite structure, where 80 protein coding genes, 30 tRNA and 4 rRNA genes were annotated in total. We also report the complete sequence of 114 genes encoded in A. artemisiifolia chloroplast genome supported by both MIRA and Velvet de novo assemblers and ordered to Helianthus annuus L. using the Geneious software. &

Value of the data
Common ragweed is one of the most aggressive invasive weed species and the most dangerous pollen allergenic plant in large areas of the temperate zone.
Understanding the chloroplast genome of this species may indirectly support chemical control of it, since a large part of herbicides have their target genes in the chloroplast genome e.g. triazinederivatives [1], diphenylethers [2] or the redox active Paraquat [3].
The reported data mean an important source for further chloroplast derived investigations like phylogenetic, photosynthetic or oxidative metabolism studies of the species.

Data
Intact chloroplasts were isolated from young leaves of Ambrosia artemisiifolia. Followed by cpDNA isolation and sequencing. The raw reads are available in Fastq format in the SRA database under the accession SRR6050242. The assembled chloroplast genome and annotated genes are available through NCBI nucleotide (MF362689).

Plant material and isolation of cpDNA
Seeds of an A. artemisiifolia plant grown in our experimental garden were sown on peat, and plants were grown in pots under greenhouse conditions.
In total, 5 g leaf tissue was collected from young, about 20 cm tall plants. To avoid high level starch accumulation the harvested leaves were incubated in Parafilm-sealed Petri dishes for 48 h at 4°C in dark before chloroplast preparation. Chloroplast was isolated using the Chloroplast Isolation kit (Sigma-Aldrich, USA) according to the instructions of the manufacturer. The intact chloroplasts were separated from the broken ones by centrifugation on top of 40/80% Percoll® gradient. To calculate the percentage of intact chloroplasts the ferricyanide photoreduction procedure was used [4]. The reduction of ferricyanid was measured spectrophotometrically at 410 nm. The percentage of intact chloroplasts of the preparation was assessed by comparing the rates of ferricyanide photoreduction with and without osmotic shock of the chloroplasts using the following formula: where A and B are the change in absorbance at 410 nm as a function of time (min) without and with osmotic shock measured by spectrophotometer. Analysis indicated that the 81% of the Ambrosia chloroplast preparation was intact and suitable for cpDNA extraction (Fig. 1). Isolation of cpDNA was performed as described by Nascimento Vieira et al. [5] with the following modification: after the addition of potassium acetate the sample was kept on ice for 2 hours. Then the procedure was continued according to reference till DNA dissolution in nuclease-free water.

Library preparation and sequencing
An Illumina paired-end cpDNA library (average insert size of 500 bp) was constructed using the Illumina TruSeq library preparation kit according to manufacturer's protocol. The cpDNA library was sequenced with 2 × 300 bp on MiSeq platform (Illumina, USA).

Chloroplast genome assembly
Prior to the de novo assembly of cp genome quality control of the raw paired-end reads (972,060 reads) were done using FastQC [6]. Based on FastQC report the trimming of low quality sequences (quality score o 20; Q20) were filtered out by using a self-developed application, GenoUtils, written in Visual Studio integrated developmental environment with C#. The remaining high quality paired end reads (864,583 reads) were assembled. To create full-length contiguous sequences without the guidance of a reference genome, we obtained de novo assembly by applying the overlap-based genome assembler MIRA (version 4.0.2) [7] and Velvet (version 1.2.10) [8]. The assembled contigs were ordered against the complete cp genome of Helianthus annuus L. as reference using the Geneious (version 9.1.6) (http://www.geneious.com) software [9].

Gene annotation
The web-based program Dual OrganellarGenoMe Annotator (DOGMA, http://dogma.ccbb.utexas. edu/) [10] was used to annotate the assembled genome using default parameters to predict protein coding genes, as well as tRNA and rRNA genes. The previously reported A. artemisiifolia transcriptome dataset [11] was used to identify the coding regions of cp genes [2]. Subsequently, BLASTN was used to further identify intron-containing gene positions by searching the de novo assembled cp genome.
The size of the complete chloroplast genome of A. artemisiifolia was found to be 152,215 bp (GC: 37.6%). The cp genome exhibited a quadripartite structure consisting of LSC and SSC regions of 84,399 bp and 17,958 bp respectively, separated by a pair of inverted repeats (IRa and IRb) each being 24,929 bp. A total of 114 genes were annotated including 80 protein coding genes, 30 tRNA genes, and 4 rRNA genes. Six of the protein coding genes and the 3' exon of rps12 are duplicated in the IR regions. Seven of the tRNA genes and all four rRNA genes are also duplicated in the IR regions. The presence of Table 1 Classification of genes after chloroplast genome reconstruction. The annotated genes were categorized according to their function. Nominations: underlined: contains one intron, underlined bold: contains more than one intron.