BrAPI—an application programming interface for plant breeding applications

Abstract Motivation Modern genomic breeding methods rely heavily on very large amounts of phenotyping and genotyping data, presenting new challenges in effective data management and integration. Recently, the size and complexity of datasets have increased significantly, with the result that data are often stored on multiple systems. As analyses of interest increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. Results To facilitate interoperability among breeding applications, we present the public plant Breeding Application Programming Interface (BrAPI). BrAPI is a standardized web service API specification. The development of BrAPI is a collaborative, community-based initiative involving a growing global community of over a hundred participants representing several dozen institutions and companies. Development of such a standard is recognized as critical to a number of important large breeding system initiatives as a foundational technology. The focus of the first version of the API is on providing services for connecting systems and retrieving basic breeding data including germplasm, study, observation, and marker data. A number of BrAPI-enabled applications, termed BrAPPs, have been written, that take advantage of the emerging support of BrAPI by many databases. Availability and implementation More information on BrAPI, including links to the specification, test suites, BrAPPs, and sample implementations is available at https://brapi.org/. The BrAPI specification and the developer tools are provided as free and open source.


Introduction
Plant breeding is widely recognized as crucial to feeding a rapidly growing population, especially in developing countries (Flavell, 2017), (http://www.fao.org/fileadmin/templates/wsfs/docs/expert_ paper/How_to_Feed_the_World_in_2050.pdf). To meet this demand, it is necessary to breed new varieties that maintain high productivity with reduced inputs and are adapted to new eco-agricultural environments resulting from climate change. Plant breeding is a complex undertaking that necessarily integrates many interrelated disciplines, each with their own conventions for data structure and storage, and increasingly large, multi-faceted datasets.
To address the challenges in the size and complexity of breeding data, a number of database systems have been designed over the years to solve specific problems. Although the power and insights that can be gleaned from large datasets increase with a greater volume and diversity of data sources, these separate systems make data integration difficult. Breeders need seamless access to all relevant data, but each system tends to keep its data siloed with ad hoc formats that hinder the ability to exchange, compare and combine data across research teams.
To meet these requirements, numerous groups have been working together to create an Application Programming Interface (API) for breeding data (Ghouila et al., 2018). An API specification describes the functions and services available in an application which can be accessed in an automated way by a computer program. It describes what services are available, what inputs are allowed, what the structure of the output data will be, and the protocol used to pass data to a service, often on the web. In recent years, web services have become the major paradigm for information exchange on the web, and web service standards have also been defined and implemented successfully by the bioinformatics community. Examples of such systems include the Distributed Annotation System (DAS) (Dowell et al., 2001), BioMOBY (Wilkinson and Links, 2002), and the EMBRACE (Pettifer et al., 2010) Web Service collection.
Most of the modern web service infrastructure follows the REST standards (Fielding and Taylor, 2002). REST stands for 'Representational State Transfer' and defines a stateless client/server communication architecture, built on the HyperText Transfer Protocol (HTTP) (https://tools.ietf.org/html/rfc7231). In a RESTful API, HTTP is the communication protocol and the available services are defined as Unified Resource Locators (URLs). Typically, the inputs are defined by constructing a URL with query parameters defined by the API (or HTTP request body objects for more complex inputs), the output data are usually returned in a defined structure. For the output, historically, XML was used, but newer APIs typically prefer the Javascript Object Notation (JSON) format.
Data exchange requires solutions on many levels, including the semantic level and the syntactic level (Doan et al., 2004). For breeding data, standardization of the semantic level has made significant progress over the last few years through the definition of ontologies for describing plant structure and development (Cooper et al., 2018), and for describing traits in popular crops (Shrestha et al., 2012). However, the breeding community still needs to standardize data at the syntax level. This can be achieved by defining a standardized Application Program Interface.
Here, we report on the design and implementation of a standard RESTful Breeding API (BrAPI), as a specification with a focus on common plant breeding data requirements. The interface was designed by members of the BrAPI consortium. A complete list of contributors is given in the consortium description and a continuously updated list can be found on the BrAPI website (https://brapi.org/).

Results
The Breeding API is a practical tool to help solve problems in accessing, exchanging, and integrating data across systems and applications. Given the multidisciplinary nature of plant breeding, there is a broad range in the particulars of the possible data operations that could be considered. Since a complete list of BrAPI related use cases would grow unmanageably large, we decided to focus on a small number of main use cases to design the primary API elements with a view towards reusability in other use cases.

Use cases
These are the main use cases we considered: 2.1.1 Field phenotyping apps Trials are often performed in fields that have limited internet connectivity, requiring special solutions for collecting phenotypic data. A popular approach is to collect data using handheld devices paired with custom mobile apps (Rife and Poland, 2014). Information about the field, the plot and accession identifiers needs to be loaded on the device before phenotypic data collection. After completion, collected data need to be uploaded to the database, when internet connectivity is available. Currently available solutions require custom files to be transferred, often involving significant user intervention. However, a simpler method would be to use an API to retrieve and store the data directly from the database.

Sample tracking
For both phenotyping and genotyping applications, analyses may need to be run by service providers, such as analytical labs and genotyping centers, that use different tracking mechanisms. The sample tracking use case describes the hand-off of the sample information to the service provider, and the subsequent retrieval of the results. In practice, tracking samples can be complex because the identifiers from several different systems must be correlated.

Genome visualization and analysis
Genome-based breeding requires extensive genotyping, which can be helpful to visualize in different ways to aid in breeding decisions. An example of such a tool is Flapjack (Milne et al., 2010), which can display a number of genotypes and run analyses on the data. BrAPI standardizes the interfaces for such tools, hence they can be used with a much wider range of data sources and without the need for special adaptations for each source.

FAIR data portals
One of the challenges of big data is identifying datasets of interest and ensuring their long term availability. This can be addressed by building federations of Findable, Accessible, Interoperable and Reusable (FAIR) data repositories (Wilkinson et al., 2016). Interfaces such as BrAPI can help such efforts by standardizing access to the data repositories, thereby creating federations. Portals to the federated data can then be deployed to provide general or community specific data access. This increases the visibility of all datasets and therefore reduces the risk of losing isolated datasets over time. The portals should implement simple searches on standard metadata, such as MCPD or MIAPPE (Ć wiek-Kupczy nska et al., 2016; Krajewski et al., 2015;Milne et al., 2010).

Data integration and exchange
In this use case, two databases exist with overlapping data as well as specific data in each database. Database A would like to access data in database B. For example, database A may contain information about accessions, such as phenotypic and trial metadata, while database B contains genotypic information. Using a BrAPI call, database A can extract the genotyping data from database B and use that data in breeding decision support.

API definition
The BrAPI definition is kept in the 'API' repository of the 'plantbreeding' organization on GitHub (https://github.com/plantbreed ing/API), with all changes to the definition managed using GitHub's 'issues', 'projects' and 'pull requests' facilities.

API organization
BrAPI calls are organized into categories that reflect the major domains needed for exchanging data between plant breeding information management systems and client applications. Some example categories include Studies, Germplasm, Traits, Trials, MarkerProfiles and Authentication. (A full list of the categories is presented in Table 1.)

URL structure
All BrAPI calls follow a common URL structure. The URL starts with a domain name (and optional base path of the implementation server) followed by '/brapi/' and the major version number. Next, the call name appears with optional object ids and other parameters. Most calls use the HTTP request method 'GET', but some require 'POST' and 'PUT', as specified in the documentation. For security, the use of SSL (HTTPS) is highly recommended for all BrAPI endpoints.

Return object structure
We have defined a standard JSON formatted response structure that is common across all calls. The standard response consists of a JSON object with a 'metadata' key and a 'result' key. The 'metadata' key provides the pagination information, an array of status information, and an array of data files. If the response data contain an array of entities which could possibly grow large, the 'pagination' object will be populated with the keys 'pageSize', 'currentPage', 'totalCount', 'totalPages' containing the appropriate values. If the response is a single entity that does not require pagination, then the 'pagination' object still must be returned, but all data elements within it should be set to zero. All pages are zero indexed, so the first page will always be page zero. The 'status' array contains a list of objects with the keys 'code' and 'message'. These status objects should be used to provide additional status or log information about the call. If the call was completed successfully and there are no status objects reported, an empty array should be returned. The 'datafiles' array contains a list of URLs to any extra data files generated by the call. For example, this could be images related to the data returned, or large data extract files which contain more data than that returned in the response payload, see Figure 1 for an example.
The data payload 'result' contains the specific model object for the given call response. There are three basic patterns that response objects follow. The 'master' pattern is used for returning all the data associated with a single entity. The 'details' pattern is used to return an array of entities. In the 'details' pattern, the 'result' object always contains a single array called 'data' and no other fields. The 'master/ details' pattern is a combination of the 'master' and 'details' patterns. It is used to represent a parent object which has an array of child entities. The 'result' object contains some data associated with the parent as well as the 'data' array with all the child entities. Whenever the 'data' array is present, the response is assumed to be paginated. This means the size of the 'data' array is always limited by the 'pagination' object in the 'metadata'.
In most cases, all the data will be contained within the JSON response. For large 'data' arrays, several requests might need to be made to retrieve every page of the array. In the event that the size of the data package exceeds what could reasonably be handled using the HTTPS protocol and the client, the service provider can place the data in a file and provide a link in the 'datafiles' array, to be downloaded later.

Authentication
A user or system may have to authenticate to a server to access protected data. BrAPI is a data communication specification, so the authentication scheme used to protect that data is considered outside the scope of the BrAPI specification. However, authentication and authorization are important topics to address whenever any kind of data is moved or presented. In order to facilitate communication of data between tools in a standardized way, the BrAPI community has developed a set of best practices using the OAuth2 architecture for implementing proper authentication with any BrAPI enabled tools and databases. In its most basic form, the OAuth2 architecture is a sessionless, token based architecture. OAuth2 allows users to sign in with user credentials they already have, and provides a token. This token can then be used to authenticate that user within different tools and databases. The token should be added as a header in every BrAPI request.

Versioning
All software projects need the ability to evolve to reflect changing requirements, to cover new use cases, and to incorporate user Provides support for sending sample metadata to an external vendor for processing (ie gene sequencing) 5 Note: In each category, there are one or more calls that provide services to support the corresponding domain of plant breeding data management.

Community
For a communication standard like BrAPI to be successful, there must be people and organizations willing to contribute and use it. Early on in the development of BrAPI, we recognized the need to foster and develop a strong community of users. This community has grown rapidly over the past few years and it now has representatives from several dozen different organizations from around the world. The development of BrAPI is a community effort. Work on the API is mainly organized around regular 'hackathons', where BrAPI contributors gather for a week of discussions and API design work. BrAPI community institutions take turns organizing and hosting the hackathons. This has proven very effective for collaborative development and capacity building (Ghouila et al., 2018). Between the hackathons, the proposed APIs are implemented at the different sites, and problems encountered during implementation are fed back into the design at the following hackathon. An important role in the community is played by the BrAPI coordinator, who helps to organize the hackathons and workshops, reviews and coordinate proposals for new or updated calls, provides support for implementers, and maintains the documentation and the BrAPI website.

Brapi.org
To serve the developer community, a website (https://brapi.org) was created as a nexus of all BrAPI related tools and information. It provides the official documentation for the API as well as information on meetings, hackathons, community news, testing tools, development libraries, BrAPPs, and a community forum.

Server implementations
BrAPI server implementations have been created for a number of popular breeding, genebank and plant genomics databases. A variety of languages and database systems have been used to develop BrAPIcompliant systems. Web frameworks' languages include Drupal/ Tripal (PHP), Catalyst (Perl), Java Spring (Java), NodeJS (JavaScript), Django (Python) whereas databases and data query systems include Postgres, MongoDB, Elasticsearch, HDF5, and MySQL. Many of these systems are open source, so their code may be adapted for other systems with similar implementation parameters. A list of current BrAPI server implementations is given in Table 2.

Client implementations
BrAPI client code libraries have been created in several languages, such as Java (https://github.com/imilne/jhi-brapi), the BrAPI R package (https://github.com/CIP-RIU/brapi), Brapi Drupal for PHP, and brapi.js for Javascript (https://github.com/solgenomics/BrAPI-js). A non-exhaustive list of current client applications is given in Table 3. It is possible for service providers to use BrAPI for the implementation of native website features. Some of these features have been implemented as reusable BrAPI compliant widgets, which we call BrAPI Apps or 'BrAPPs' for short. The available BrAPPs are listed on the BrAPI website (https://brapi.org/brapps.php). Figure 2 shows a screenshot of an example BrAPP which performs graphical filtering of phenotypic values.

Test suites and fixtures
Comprehensive testing is very important for any software project. Testing tools are available for both BrAPI server implementations and BrAPI enabled clients.
For testing BrAPI enabled clients, a BrAPI test server is available at the brapi.org site (https://test-server.brapi.org/brapi/v1). The BrAPI Test Server has a complete implementation of the BrAPI specification and returns a consistent sample set of data. This allows developers of clients to build tests which are appropriate for their tool, while calling a live BrAPI server implementation. The sample data reported by the test server are completely fabricated, and can be updated at any time upon request.

BrAPI validator (Brava) test tool
For testing server implementations, the BRAVA test client is available for testing compliance with the BrAPI specification (http:// webapps.ipk-gatersleben.de/brapivalidator/). Available as a web frontend, BRAVA enables developers to check the compliance of their BrAPI endpoints against the specification and the referential integrity of input and output parameters of dependent endpoints. The frontend, as shown in Figure 3, enables testing of BrAPI server implementations. A user can also schedule tests and generate periodic reports of the overall status and details of BrAPI endpoint compliance. The compliance tests and results are grouped by and aggregated per REST resource. Using the BrAPI meta-endpoint '/ calls', BRAVA is able to detect the available endpoints on the server and will only test those endpoints.
A given endpoint might be tested multiple times with different inputs or HTTP methods. Each test checks the HTTP status code, content type, validity of response body, and response data types. Each test will also compare the response to the expected JSON schema which defines the structure of a JSON object and acceptable types. Some tests check the compatibility of response data to a corresponding parameter. For example, a test will call '/germplasmsearch' and will use the first 'germplasmDbId' from the response to make the call '/germplasm/{germplasmDbId}'. Some tests will compare a response value to a previously stored value. For example, an entity accessed by calling '/germplasm/1' must have a 'germplasmDbId' of '1'.
When a test run is complete, the test suite result is sent to the web client and a report is generated. The report can be inspected in the client and has a tree-like structure to analyse results for individual calls. The scheduled and public resource test reports are stored for future assessment.

Discussion and Outlook
We have defined a first version of a plant breeding API that defines the key calls needed to exchange information about germplasm, phenotypes, experiments, studies, geographic locations, samples, and genetic markers. This opens the door to a rich set of possibilities for building client applications that can work with any BrAPIcompliant data provider.
Since 2015, a diverse group of data providers and client application programmers have been building BrAPI into their software. Client applications can rely on the standard interface to enable integration with any BrAPI data source. Building software using standard interfaces is an efficient and sustainable coding practice which enables the reuse of software components. As the public plant breeding software community is relatively small, this will be essential for creating a feature-rich breeding software ecosystem. A good example of the efficient reuse of components can be seen in the community developed BrAPPs, which are tools that make extensive use human friendly documentation of BrAPI formats and concepts. Furthermore, it will also provide reference concepts and schemas necessary to integrate BrAPI with other initiatives such as bioschemas.org.
Beyond breeding applications, BrAPI has also found a niche in gene bank applications, such as MGIS (Ruas et al., 2017), through compatibility with the Multi-Crop Passport Data standard (MCPD). Although the initial intent was to enable interoperability between breeding management resources, BrAPI can also be used with other types of databases, such as plant genetic resources databases [i.e. MGIS (Ruas et al., 2017)] and plant genome databases (i.e. SGN, MaizeGDB, etc.). BrAPI offers a way to link genetic resources distributed by gene banks with materials used in breeding programs. Improved integration between gene banks and plant breeding management databases, and genomic databases has the potential to greatly enhance the management and utilization of plant germplasm collections (Ruas et al., 2017;Spindel and McCouch, 2016). Efficient and smart use of genetic diversity is a key for continued progress in plant breeding efforts to address the challenges of increased productivity and adaptation (Halewood et al., 2018).
As the needs and technologies of our community continue to evolve, we expect BrAPI to grow to meet those needs.

Getting involved
We invite the reader to join our community and contribute to the future of BrAPI. To start, please visit https://brapi.org/ to learn more, contact the BrAPI coordinator at brapicoordinatorselby@gmail.com to join the mailing list, Slack channel, and community forum.