Dataset of next-generation sequence reads of nanobody clones in phage display library derived from Indian desert camel (Camelus dromedarius L.)

Next-generation sequences (NGS) dataset of nanobody (Nb) clones in a phage display library (PDL) is of immense value as it serves in many different ways, such as: i). estimating the library size, ii). improving selection and identification of Nbs, iii). informing about frequency of V gene families, diversity and length of CDRs, iv). high resolution analysis of natural and synthetic libraries, etc. [1], [2], [3]. We used a fraction of our previously constructed PDL of Nbs derived from an E. coli lipopolysaccharide-immunized Indian desert camel in order to obtain the dataset of NGS reads of Nbs. The cryo-preserved transformants library was revived to extract the Nb-encoding VHH (inserts)-pHEN4 (vector) DNA pool. The DNA sample was used for amplifying VHH pool by PCR [6]. The VHH amplicons band was gel-purified and subjected to NGS using Illumina MiSeqTM platform. ‘Nextra XT micro V2 Index’ kit was used for the Nb library DNA sample sequencing, with the adaptors: ‘i7’ (N706: TAGGCATG) and ‘i5’ (S517: GCGTAAGA). The raw data comprised of a total read count of 182146 (matched= 179591; unmatched=2555), with average read length of 130.33 bases and a total of 23.74 Mb. Of 179591 matched reads, 142004 were paired reads and 37587 broken paired reads. The raw data of NGS reads was submitted to NCBI Sequence Reads Archive accessible at URL: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA516512 (dataset ref. [7]), and after analysis deposited in Mendeley Datasets repository, which is accessible at URL: [https://data.mendeley.com/datasets/4rsz3snvk5/3] (dataset ref. [8]). The sequence reads were analyzed by bioinformatics tools [9], [10], [11], [12]. The assembled consensus contigs revealed Nb orthologs of diverse Ag-specificities, including those isolated by conventional panning and Sanger-sequenced functional Nbs. Contig 1 CDR1-3 matched to those of anti-Trypanosoma evansi RoTat1.2 variant surface glycoprotein (VSG), while Contig 2 CDR1-3 matched to those of anti-LPS Nb clones isolated from the library. Contig 3 was however incomplete and lacked CDR3. Despite lacking the depth, the NGS data is a useful guide for selection of antigen-specific Nbs from the library, as demonstrated by anti-T. evansi VSG Nbs, and provides templates for Nb-based diagnostic reagents and therapeutic agents.


a b s t r a c t
Next-generation sequences (NGS) dataset of nanobody (Nb) clones in a phage display library (PDL) is of immense value as it serves in many different ways, such as: i). estimating the library size, ii). improving selection and identification of Nbs, iii). informing about frequency of V gene families, diversity and length of CDRs, iv). high resolution analysis of natural and synthetic libraries, etc. [1][2][3] . We used a fraction of our previously constructed PDL of Nbs derived from an E. coli lipopolysaccharide-immunized Indian desert camel in order to obtain the dataset of NGS reads of Nbs. The cryo-preserved transformants library was revived to extract the Nb-encoding VHH (inserts)-pHEN4 (vector) DNA pool. The DNA sample was used for amplifying VHH pool by PCR [6] . The VHH amplicons band was gel-purified and subjected to NGS using Illumina MiSeq TM platform. 'Nextra XT micro V2 Index' kit was used for the Nb library DNA sample sequencing, with the adaptors: 'i7' (N706: TAGGCATG) and 'i5' (S517: GCGTAAGA). The raw data comprised of a total read count of 182146 (matched = 179591; unmatched = 2555), with average read length of 130.33 bases and a total of 23.74 Mb. Of 179591 matched reads, 142004 were paired reads and 37587 broken paired reads. The raw data of NGS reads was submitted to NCBI Sequence Reads Archive accessible at URL: https://www.ncbi.nlm.nih.gov/Traces/study/ ?acc=PRJNA516512 (dataset ref. [7] ), and after analysis deposited in Mendeley Datasets repository, which is accessible at URL: [ https://data.mendeley.com/datasets/4rsz3snvk5/ 3 ] (dataset ref. [8] ). The sequence reads were analyzed by bioinformatics tools [9][10][11][12] . The assembled consensus contigs revealed Nb orthologs of diverse Ag-specificities, including those isolated by conventional panning and Sangersequenced functional Nbs. Contig 1 CDR1-3 matched to those of anti-Trypanosoma evansi RoTat1.2 variant surface glycoprotein (VSG), while Contig 2 CDR1-3 matched to those of anti-LPS Nb clones isolated from the library. Contig 3 was however incomplete and lacked CDR3. Despite lacking the depth, the NGS data is a useful guide for selection of antigenspecific Nbs from the library, as demonstrated by anti-T. evansi VSG Nbs, and provides templates for Nb-based diagnostic reagents and therapeutic agents.  Table   Subject Biological Sciences: Immunology Specific subject area Immunology: Domain antibodies or nanobodies library; phage display technology platform Type of data

Value of the Data
• The NGS dataset is a resource of endotoxin-neutralizing and other biotechnologically and clinically useful nanobody clones derived for the first time from an immunized Indian desert camel. • The investigators interested in the comparative studies on mammalian Ig-VH repertoire evolution, and those interested in novel nanobody-based drug designs will benefit the most. • The dataset provides phylogenetically related templates for reformatting and re-designing of novel nanobody constructs.

Dataset of NGS reads of Nbs in phage display library derived from LPS-immunized Indian desert camel
The cryo-preserved culture of TG1 transformant library was revived. The insert-vector (VHH-pHEN4) DNA extracted and purified from 100 mL of the transformants library culture was found to be 70 μg/mL. with A 260 /A 280 ratio of 2.0. The library of nanobody-encoding VHH clones was amplified by PCR from the purified plasmid DNA and the PCR products purified from agarose gel. PCR products resolved by agarose gel electrophoresis (AGE) are shown in Fig. 1   MiSeq TM system. The raw NGS data mentioned above were deposited at NCBI Sequence Reads Archive (SRA) and the run SRR8477290 is accessible at URL: [ https://www.ncbi.nlm.nih.gov/ Traces/study/?acc=PRJNA516512 ] (dataset ref. [7] ). Preliminary analysis was done and the NGS data files were submitted on December 05, 2020 to Mendeley Datasets Repository, and the published dataset can be accessed at URL: [ https://data.mendeley.com/datasets/4rsz3snvk5/3 ] (dataset ref. [8] ). Preliminary examination using CLC Genomics Workbench 12.0 (Qiagen Bioinformatics) revealed 91,073 sequence reads in each file (Mendeley dataset files [8] ). A total read count was 182146 (matched = 179,591; unmatched = 2555), with average read length of 130.33 bases and a total of 23.74 Mb. Of 179591 matched reads, 142004 were paired reads and 37,587 broken paired reads; 77,249 mapped and 13,824 not-mapped reads (Mendeley dataset files: ABT-28_S24_L0 01_R1_0 01_24 (paired) assembly summary report; ABT-28_S24_L0 01_R2_0 01 assembly coverage analysis report; ABT-28_S24_L0 01_R1_0 01 mapping summary report [8] ). Quality score 'Q30' (1 in 10 0 0 bases incorrect) at 2 × 150 bp was > 80%, and overall QC report was acceptable (Mendeley dataset files: ABT-28_S24_L0 01_R2_0 01 -duplicated sequences QC report, ABT-28_S24_L0 01_R2_0 01 -supplementary QC report [8] ).
IgBLAST ( https://www.ncbi.nlm.nih.gov/igblast/igblast.cgi ) of contig 1, -2 and -3 consensus sequences revealed human VH subfamily III-like features of camel VHH and identified Framework regions (FR) 1-4 and complementarity determining regions (CDR)1-3 (Mendeley dataset files:   Table 3 shows contig 1 and -2 having CDR1, -2 and -3, whereas contig 3 being of shorter length, having only CDR 1 and -2 (Mendeley dataset file: CDR1-3 in NGS consensus contig 1-3 of PDL_Idc matched with anti-LPS and anti-T. evansi VSG Nb clones [8] ). In addition, Table 3 shows that the contig 1 CDR1, -2 and -3 are identical to the T. evansi VSG Nb clones, which were isolated from the library and sequenced by Sanger's automated technique (Mendeley dataset file: IgBLAST-MW310247 (anti-T evansi Ro-Tat1.2 VSG Nb Cl11); IgBLAST-MW310248 (anti-T evansi RoTat1.2 VSG Nb Cl17); CDR1-3 in NGS consensus contig 1-3 of PDL_Idc matched with anti-LPS and anti-T. evansi VSG Nb clones [8] ). The contig 2 CDR 1 and -2 are identical to the LPS-binder Nb clones previously isolated from the library and sequenced by Sanger's automated technique. However, contig 2 CDR3 sequence has only one amino acid position mismatch with four LPS-binders and two amino acid positions with one LPS-binder clone. One randomly picked and Sanger-sequenced clone (EU429319) from the library showed that CDR 1 and -3 were similar to the LPS-binder clones, whereas CDR2 was entirely different. The  As demonstrated by isolation of anti-T. evansi RoTat1.2 VSG Nb clones, the PDL-NGS reads-guided approach could be used for search of Nbs specific to other antigens, and for search of templates suitable for Nb reformatting and Nb-based drug designing. Fig. 3 presents the phylogenetic tree showing relatedness of contig 2 consensus sequence to various nanobodies. The homology-based evidence led us to select successfully T. evansi Agspecific Nb clones by panning of our phage display library. Like-wise, we are attempting to isolate Nb clones that bind Ags from veterinary pathogens such as Pasteurella multocida B:2 LPS, FMD virus serotypes, Staphylococcus aureus exotoxins, etc. As mentioned above, the homologybased evidence also exists for various Nbs against other animal and human pathogens, and VEGF, thereby strengthening our notion that the phage display library in its existing form is a valuable Nb resource of Indian origin.

The phage display library of nanobodies derived from Indian desert camel: the background information
The present NGS dataset was obtained from a previously constructed and cryo-preserved Nbphage display library derived from Indian desert camel (PDL-Idc). The PDL-Idc was constructed according to Ghahroudi et al. [4] , with some modifications in our laboratory during 2006-07.
Standard rDNA and molecular cloning protocols had been followed [5] . The 370-420 bp Nb PCR amplicon pool was directionally cloned into PstI and NotI sites of pHEN4, the phage display vector, to allow expression of Nb with C-terminal decapeptide hemagglutinin (HA) tag in fusion with M13 phage pIII minor coat protein in the transformed amber-suppressor E. coli strain TG1. With transformation efficiency of about 80%, the size of Nb-pHEN4 transformant TG1 library was estimated to be comprising of > 5 × 10 7 Nb clones. The TG1 library clones were cryopreserved in LB/amp/glu/30% glycerol at -70 °C. When required, the library of Nb-displaying phages was rescued by infection of the above TG1 transformants with M13K07 helper phage. From the phage library, E. coli LPS-binders and S. aureus β-hemolysin-binder Nb-displaying phage clones were selected by panning following a standard protocol, expressed and characterized [6] . Subsequently, a few Nb clones of LPS-binders and S. aureus hlb-binders were expressed, purified, characterized and sequenced by Sanger's automated technique. Nucleotide sequences of representative Nb clones were submitted to NCBI GenBank [ http://www.ncbi.nlm.nih.gov/BankIt/ ]. Immuno-and bioinformatics techniques revealed that these sequences (GenBank Accession no. EU429319, EU861212, KF990215; KF990216; KF990217; GU014816) had the camelid VHH hallmark amino acids, i.e., S11, F37, E44, R45, and were phylogenetically closer to the Arabian camel VHH sequences than to llama and other new world camelids.

Extraction of VHH-pHEN4 phagemid DNA from TG1 transformant library for NGS
The transformant library was revived by thawing at room temperature (RT). The library culture aliquot (1.5 mL) was inoculated in 150 mL of LB/Amp broth in sterilized conical flasks and grown at 37 °C with shaking for 20 h [6] . VHH-pHEN4 phagemid DNA was extracted from 100 mL of the above culture of the TG1 transformants library and purified using commercially available maxi-column (Qiagen, Germany) according to the manufacturer's protocol. The purified DNA was quantified by A 260 , and its purity assessed by agarose gel electrophoresis (AGE) and A 260 /A 280 ratio.

VHH-PCR of Nb clones
VHH-PCR was performed for amplification of VHH clones (that encode for Nb clones) using VHH-specific primers, viz., A6E and FR4For [6] . Ten reactions, each of a final volume of 50 μL were set-up and carried out in a thermocycler as follows: The reaction components were added in sequence as follows: nuclease-free water, 2x PCR buffer master mix (Dream Taq Green master mix, Thermo Scientific), 200 pM each primer and 5.0 μL template DNA (VHH-pHEN4 purified above), making the final reaction volume 50 μL in 0.2 mL PCR tube. The DNA was amplified in ABI thermocycler using protocol: 94 °C for 4 min, 30 cycles of [denaturation at 94 °C for 1 min, annealing at 63 °C for 1 min and extension at 72 °C for 1 min], final extension at 72 °C for 10 min. and held at 4 °C. The PCR product was stored at -20 °C till further use.

Agarose gel electrophoresis (AGE) of PCR products for extraction and purification
PCR products were resolved by AGE in 2% agarose in 1X tris-acetate-EDTA buffer, pH 8.3 (1x TAE) in a mini-gel electrophoresis apparatus at 5 V/cm of gel length [5] . A 100 bp DNA ladder (Thermo Scientific, USA) was run for determining the size of PCR products. The VHH-PCR products resolved by AGE above were visualized at 312 nm in a gel documentation system and the 370-420 bp bands excised [5] . The gel pieces were transferred to 1.5 mL Eppendorf microfuge tubes, and DNA extracted and purified using a commercial gel extraction kit (Qiagen, Germany). The DNA was eluted in 10 mM tris-HCl, pH 8.5 and stored at -20 °C till further use. DNA was quantified by A 260 and the purity assessed by A 260 /A 280 ratio.

NGS of the library of Nb clones by Illumina ® MiSeq TM system
DNA library of Nb sequences was subjected to Illumina ® MiSeq TM system available in the Department of Animal Biotechnology, LUVAS, Hisar. The NGS data of the library DNA sample was acquired and initially analyzed with the software available on the platform. 'Nextra XT micro V2 Index' kit (with total output data of 800 Mb and 8 million paired-end reads of 150 bp length) was used for the Nb library DNA sample sequencing, with following adaptors: 'i7' (N706: TAGGCATG) and 'i5' (S517: GCGTAAGA). Raw NGS fastq data files were deposited at NCBI Sequence Reads Archive (SRA). The SRA data with Accession no.PRJNA516512 was released for public access on 22 February 2020, and the run SRR8477290 are accessible at URL: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA516512 (dataset ref. [7] ). After preliminary analysis, the dataset was deposited at Mendeley Datasets Repository, and version 3, which is accessible at URL: [ https://data.mendeley.com/datasets/4rsz3snvk5/3 ] was published on 08 December 2020 (dataset ref. [8] ).

Bioinformatics for determination of genetic diversity of the nanobodies library
Quality scores of the raw NGS reads, trimming of adaptors, etc. was done by the software of MiSeq TM system. Contigs to get full-length nucleotide sequence of Nb clones was done using De Novo Assembler at CLC Genomics Workbench 12.0 (Qiagen Bioinformatics). Basic molecular biology tools available at NCBI and CLC software were also used. Copy numbers of the sequence reads in the library were estimated. Searches within the databases were done using NCBI Nucleotide Blast [URL: https://blast.ncbi.nlm.nih.gov/Blast.cgi ] and IgBLAST [URL: https://www.ncbi. nlm.nih.gov/igblast/igblast.cgi ] [9][10][11][12] . The dataset analysis files are available at Mendeley dataset repository (dataset ref. [8] ).

Ethics Statement
The work to obtain and analyse NGS dataset of the phage display library did not involve the use of humans or animals.

Declaration of Competing Interest
The phage display library of nanobodies derived from LPS-immunized Indian desert camel was produced in 2009 in a research project funded by Department of Biotechnology (DBT), Government of India and implemented by the second author as Principal Investigator at Ch. Charan Singh Haryana Agricultural University (CCSHAU), Hisar, Haryana. The said library is an Nb resource of implied commercial interests and is a joint intellectual property of the funding agency (DBT) and CCSHAU according to mutually agreed terms and conditions. NGS of Nbs in the said library has been funded by ICAR, New Delhi for Emeritus Scientist scheme being implemented in LUVAS, Hisar. Any commercial interests arising from these findings will be jointly settled by LU-VAS, Hisar and ICAR, New Delhi. However, these declared 'Conflicts of Interest' did not influence the results of the present investigation.