GrainGenes: centralized small grain resources and digital platform for geneticists and breeders.

GrainGenes (https://wheat.pw.usda.gov or https://graingenes.org) is an international centralized repository for curated, peer-reviewed datasets useful to researchers working on wheat, barley, rye and oat. GrainGenes manages genomic, genetic, germplasm and phenotypic datasets through a dynamically generated web interface for facilitated data discovery. Since 1992, GrainGenes has served geneticists and breeders in both the public and private sectors on six continents. Recently, several new datasets were curated into the database along with new tools for analysis. The GrainGenes homepage was enhanced by making it more visually intuitive and by adding links to commonly used pages. Several genome assemblies and genomic tracks are displayed through the genome browsers at GrainGenes, including the Triticum aestivum (bread wheat) cv. 'Chinese Spring' IWGSC RefSeq v1.0 genome assembly, the Aegilops tauschii (D genome progenitor) Aet v4.0 genome assembly, the Triticum turgidum ssp. dicoccoides (wild emmer wheat) cv. 'Zavitan' WEWSeq v.1.0 genome assembly, a T. aestivum (bread wheat) pangenome, the Hordeum vulgare (barley) cv. 'Morex' IBSC genome assembly, the Secale cereale (rye) select 'Lo7' assembly, a partial hexaploid Avena sativa (oat) assembly and the Triticum durum cv. 'Svevo' (durum wheat) RefSeq Release 1.0 assembly. New genetic maps and markers were added and can be displayed through CMAP. Quantitative trait loci, genetic maps and genes from the Wheat Gene Catalogue are indexed and linked through the Wheat Information System (WheatIS) portal. Training videos were created to help users query and reach the data they need. GSP (Genome Specific Primers) and PIECE2 (Plant Intron Exon Comparison and Evolution) tools were implemented and are available to use. As more small grains reference sequences become available, GrainGenes will play an increasingly vital role in helping researchers improve crops.


Introduction
Since 1992, GrainGenes has served as an international centralized hub for peer-reviewed biological data for small grains researchers working on wheat, barley, rye and oat. Its content includes genetic maps, markers, traits and phenotypes (1,2). With a range of query and visualization tools, including a Structured Query Language (SQL) interface, GrainGenes serves information to enable breeders and geneticists to improve small grain varieties and to better understand the biology of important crops at a molecular level. Since wheat alone constitutes 20% of the calories and 20% of proteins consumed by humans (3), access to the wealth of data in crop databases like GrainGenes contributes to worldwide food security.
Reflecting the global importance of small grains and crop diversity, the userbase of GrainGenes is distributed across six continents. More than half of the GrainGenes users are located in three countries: the USA, China and India. These users are harnessing the resources at Grain-Genes to understand the genetic and genomic bases of traits and to develop plants that can combat the continuing challenges from negative biotic and abiotic conditions, such as drought, flood, diseases and pests.
To serve its users, GrainGenes prioritizes and curates peer-reviewed small grains datasets and displays them through dynamically generated web pages. These pages link to tools at GrainGenes and to other biological databases where users can learn more about their biological entities of interest (e.g. genes, markers, genomic regions). Here we describe the most recent additions to the curated data content and tools at GrainGenes that are available to our users. New datasets include results from genomic experiments, characterizations of genes conferring disease resistance and other plant traits and genome sequences from hexaploid 'Chinese Spring' (bread wheat), tetraploid 'Zavitan' (wild emmer wheat) and 'Morex' (barley). New tools include genome browsers for assembled genomes and an expanded collection of BLAST databases. We also improved our front page to make it more visual and intuitive for better user experience.

The GrainGenes portal
The GrainGenes website serves as an information hub for the small grains community and includes linked data sets, job listings and upcoming events. The web interface of GrainGenes is coded in HTML, PHP, Perl and JavaScript, is supported by a MySQL backend database and is wrapped by an instance of the Drupal (drupal.org) content management system. According to Google Analytics, in the 2018 calendar year, GrainGenes was used by 26 433 users (based on unique IPs) in 69 858 sessions and provided 460 291 page views. The main data types at GrainGenes are DNA sequences, quantitative trait loci (QTLs), genetic maps, germplasm, genes, genetic markers (loci, probes, alleles), trait studies, proteins, images and colleague records, among others (Supplementary Figure 1, as of January 2019). Grain-Genes has curated information for many genera, including Triticum (354), Aegilops (50), Secale (40) and Avena (147). The numbers in parentheses include various species, historical names and aneuploids.

Enhanced homepage
The GrainGenes team has recently made extensive enhancements to the GrainGenes Homepage to improve content, ensure consistent visual communication and provide means for user feedback and interaction ( Figure 1). Recent improvements to this interface include the addition of links for data download, access points for the GrainGenes Grains and OatMail mailing lists and tutorials. The visibility of 'Species Portals' on the front page has also been improved. These include the Annual Wheat Newsletter, the Barley Genetics Newsletter and the Oat Newsletter. The 'GrainGenes Updates' have been itemized and dated, and new 'Quick Links' have been added for the database browser, the GrainGenes genome browsers and the revamped CMap page (4,5; https://wheat.pw.usda.gov/ GG3/CMAP/). To improve communication with our users, a 'Feedback' button has now been added to the header of each page, promoting a 24-hour response time to assure users that their feedback is valued. The GrainGenes team

Genomic data visualization
Driven by lower sequencing and computational costs (6), more genomes are being sequenced and assembled.

New curated genetic content
Since 2016, several sets of high-impact genetic markers and maps have been added to the GrainGenes database (16-27; Table 1). These data are displayed through a CMap (5) instance, thus adding value to the data content at Grain-Genes for genetic maps, QTLs, loci, genes, DNA sequences and germplasm. Table 1 is a brief summary of newly curated genetic data sets (28).
Collecting this information has involved developing and maintaining close collaborations with many groups, for example MASWheat (http://maswheat.ucdavis.edu), an extensively curated site maintained at UC Davis by Jorge Dubcovsky and Marcelo Soria that assists small grains researchers in the use of marker-assisted selection for a   (26) wide range of traits. We curated data sets for the leaf rust resistance LR genes (e.g. Lr19), the stem rust resistance SR genes (e.g. Sr25) and the stripe rust Yr resistance genes (e.g. Yr5). A complete list can be found here: https://wheat.pw. usda.gov/GG3/node/657. A more recent collaboration has been established with Agriculture and Agri-Food Canada and the oat community. We added two new oat (A. sativa) map data sets to GrainGenes. These are from the Pendek 39×Pendek 48 Oat-2004-P39×P48 and Pendek 48×Pendek 38 Oat-2004-P48×P38 populations (29). We have begun to curate and include additional 'fragmentary' maps from the historical literature and more recent studies as well. New consensus maps for oat have also been added (30)(31)(32)(33)(34).
In addition, to reconcile marker names, GrainGenes has embarked on a major effort to add probes, marker types and locus orthology groups to all existing historical markers in GrainGenes, thus enabling CMap to create more correspondences among existing maps in GrainGenes through common markers. Also, GrainGenes is working with the AgBioData Consortium to develop common data standards (35). More information about recent updates to the GrainGenes database can be found here: https://wheat. pw.usda.gov/GG3/GGupdates.

New tools and resources at GrainGenes: GSP and PIECE2
GrainGenes has added the web-based tool GSP (Genome Specific Primers), designed to identify genome-specific targets in polyploid species (https://probes.pw.usda.gov/GSP/; 36), to its existing tool set (Supplementary Figure 2). When users enter sequence(s), GSP calculates and predicts primer sets that will amplify polymerase chain reaction product(s) from a specific subgenome of a polyploid species, such as T. aestivum, Panicum virgatum or Gossypium hirsutum. PIECE version 2 (Plant Intron Exon Comparison and Evolution; 37), a database for studying plant gene structure and evolution, is now a part of GrainGenes (http://probes.pw. usda.gov/piece) as well. We encourage users who would like to learn more about GSP and PIECE2 to read the following publications, which describe the tools in detail and provide use cases that demonstrate how these tools can be useful for research (36,37).

Data discovery through the Wheat Information System
Wheat Information System (WheatIS) is an international collaborative website led by researchers at Unité de Recherche Génomique Info (URGI) under Institut National de la Recherche Agronomique in Versailles, France.
GrainGenes is part of the WheatIS Expert Working Group that consists of representatives from universities, governments and industries with the goal of sharing information and best practices for storing, analyzing and displaying wheat genomic sequence data (38,39). URGI is the largest contributor to WheatIS, followed by Ensembl Plants (40). The full list of collaborators and contributors can be found at http://wheatis.org/Collaborators.php and in Alaux et al. (39). WheatIS indexes wheat data sets from globally distributed databases to make them searchable at wheatis.org and provides links back to databases. As a result, instead of visiting multiple databases separately, users can use the WheatIS website to query and find information distributed across different databases in one single location. The data providers have several options for their data to be indexed at WheatIS, including (i) sending a comma-separated value file to the developers through the WheatIS website and (ii) installing and configuring an Apache Solr server (http://lucene.apache.org/solr/) and creating custom-formatted data sets to make them searchable at WheatIS. The data formats can be easily obtained by contacting WheatIS personnel.
As a service to the global small grains community, GrainGenes indexed 548 QTLs, 91 genetic maps, 10 physical maps and 14 411 germplasm records at WheatIS with links back to appropriate GrainGenes pages. In addition, GrainGenes collaborated with the Wheat Gene Catalogue team to index 3119 genes from the Catalogue at WheatIS with links to the appropriate pages at the Komugi database, located in Japan (https://shigen.nig.ac.jp/wheat/ komugi/genes/symbolClassList.jsp).

Training and public outreach
While GrainGenes attempts to provide intuitive interfaces, gaining the knowledge required for the most effective use of the GrainGenes database and its related tools is critical to our users. To this end, we created the GrainGenes YouTube Channel (https://wheat.pw.usda.gov/GG3/tutorials). To date, three training videos have been posted on YouTube for various tools and features offered at GrainGenes: (i) 'Navigating the GrainGenes homepage', (ii) a database browser demo and (iii) a demo on 'Obtaining FASTA sequences from Markers'. In addition, PDF slides converted from the video tutorials are also available, as well as slides on how to use the genome browsers, CMap and the Quick Queries tool. In accordance with the GrainGenes project mandate, the GrainGenes team will continue to create training videos for the users.
Important scientific articles and breakthroughs in the area of small grains research are shared on the Grain-Genes homepage. To improve public outreach, social media accounts have been created for GrainGenes on Facebook and Twitter to (i) increase the reach of GrainGenes updates, (ii) attract new users, (iii) inform users about new tools and (iv) broadcast news of new data at GrainGenes. Important scientific articles and breakthroughs in the area of small grains research are shared on the GrainGenes homepage. GrainGenes also provides a free service for the announcement of relevant job openings and meetings. Such public outreach efforts help GrainGenes to continue serving as a hub for small grains communication.

Conclusion and future steps
With the decreasing cost of genome sequencing technologies and the improvement of assembly algorithms, we expect more small grains genomes to be sequenced and assembled in the near future. In addition, new methods such as genotyping-by-sequencing (41,42) are creating large amounts of data, reflecting the vast diversity and complexity of small grains genomes. More pangenomes that integrate genome assemblies with diversity datasets are also expected to become available. The integration of large amounts of data will be increasingly challenging, but will also be increasingly valuable. Because biological repositories are tasked with managing, curating, storing, querying and visualizing an ever-increasing volume and diversity of data, more resources will be required to store, analyze and link these datasets, and more efficient web-based tools must be implemented. GrainGenes recognizes these challenges. Our future objectives include developing semi-automated curation workflows, implementing improved web-based visualization and query tools and developing computational pipelines to create integrated views. GrainGenes will continuously seek feedback from its users and the GrainGenes Liaison Committee to enhance GrainGenes' value for the broader small grains community.

Supplementary data
Supplementary data are available at Database Online.