The Flora Incognita app – Interactive plant species identification

Being able to identify plant species is an important factor for understanding biodiversity and its change due to natural and anthropogenic drivers. We discuss the freely available Flora Incognita app for Android, iOS and Harmony OS devices that allows users to interactively identify plant species and capture their observations. Specifically developed deep learning algorithms, trained on an extensive repository of plant observations, classify plant images with yet unprecedented accuracy. By using this technology in a context‐adaptive and interactive identification process, users are now able to reliably identify plants regardless of their botanical knowledge level. Users benefit from an intuitive interface and supplementary educational materials. The captured observations in combination with their metadata provide a rich resource for researching, monitoring and understanding plant diversity. Mobile applications such as Flora Incognita stimulate the successful interplay of citizen science, conservation and education.


| INTRODUC TI ON
The global loss of biodiversity is among the most urgent environmental problems of our time, threatening to compromise stability and functioning of ecosystems (Barnosky et al., 2012;Ceballos et al., 2015;Dirzo et al., 2014). Ongoing conservation efforts require an accurate understanding of spatiotemporal patterns of biodiversity and their change over time (Chapman & Busby, 1994).
Deficiencies in both the quality and the availability of biodiversity data currently prevent data-driven conservation decisions, such as land-use planning and species conservation assessments (Boakes et al., 2010;Geijzendorffer et al., 2016;Proença et al., 2017).
Governments and scientific agencies typically lack the resources to fund long-term biodiversity assessments by professional scientists and therefore recruit volunteers, both beginners and experts, to meet their assessment goals (Miller-Rushing et al., 2012).
Biodiversity monitoring is a labour-intensive task, heavily relying on individual expertise to correctly identify species in the field. In-situ species identification is almost impossible for untrained people and challenging even for professionals, putting it beyond the reach of many nature enthusiasts (Bonnet et al., 2018). The situation is further aggravated by the increasing shortage of skilled taxonomists (Frobel & Schlumprecht, 2014;Hopkins & Freckleton, 2002). For these reasons, there has long been interest in developing automated species identification systems (Gaston & O'Neill, 2004;Wäldchen & Mäder, 2018b). Initial image-based approaches were proposed 15 years ago but only now have become a reliable alternative to manual identifications (Affouard et al., 2017;Jones, 2020;Seeland et al., 2019;Wäldchen & Mäder, 2018a;Wäldchen et al., 2018).
Recent boosts in data availability accompanied by substantial progress in machine learning algorithms, notably deep convolutional neural networks (CNNs; LeCun et al., 2015), pushed these approaches to a 'production-ready' state. Automated species identification can now significantly contribute to biodiversity and conservation research . In this paper, we describe Flora Incognita, a mobile application for automated plant species identification and observation recording. The application combines novel advances in machine learning-enabled identification with a plant species field guide. Flora Incognita differs from alternative plant identification systems by taking a multi-modal and interactive approach that not only analyses a single image depicting an unknown plant but incorporates habitat information and queries the user for images of one or more complementary plant organs to deliver a precise identification Wittich et al., 2018).

| Interactive identification process
The first user action in the identification process is choosing which growth form the unknown species belongs to. We distinguish between forbs, grasses, ferns and trees to request specific image perspectives, for example, 'Take an image of the tree's stem'. We found that by requesting these detailed perspectives, users typically take better pictures. In addition to automatic image classification, we also incorporate environmental variables in the identification process. In parallel to querying a user about the growth form of the unknown plant, we automatically transfer the occurrence's geolocation (given the user's consent) and date to predict a prior for the later image analysis. This prediction incorporates the botanical, geographical and climatic context of an observation and provides a first hypothesis about which species are more or less likely to be identified in the current observation (Wittich et al., 2018). In an adaptive number of succeeding interactions, we ask the user to take images of growth form-specific organs, such as flower or leaf. These images pass an immediate plausibility check, notifying the user if the image is unlikely to depict part of an actual plant, before being uploaded for classification. The process terminates when either a species has been identified with a sufficiently high score or no more additional information has been gained in the preceding interactions. We run intensive studies to determine which perspectives are the most characteristic per growth form and are easy to acquire for a user (Rzanny et al., 2017.

| Deep neural network classifier
The automatic identification used in Flora Incognita is based on latest machine learning technologies. We design and continuously retrain the three types of models based on our repository of species observation data once a significant amount of new observations is available. First, a CNN classifier analyses single image and predicts a ranked list of candidate species depicted on the image. The model uses an architecture with 88.9 million learnable parameters which was itself optimized using machine learning (Zoph et al., 2018) and is trained on currently more than 1 million plant images on a cluster of GP-GPUs (general-purpose graphics processing units) over a period of several months (full training). Second, a deep feedforward network uses location embeddings and similarity learning for predicting likely species at a given location and time based on presenceabsence maps, occurrence records acquired by our users and various databases, for example, soil type, land cover, phenological regions.
Finally, we train a recurrent neural network model with structured observations to learn an optimal fusion of the different information sources to predict a candidate species. These observations consist of multiple feature vectors extracted from the images depicting different plant organs and different perspectives of the individual by the aforementioned networks.

| The Flora Incognita taxonomy
We initially focussed on native wild growing plants in Germany and chose a widely accepted local list of ferns and vascular plants as a taxonomic backbone (Wisskirchen & Haeupler, 1998). We adopted all included taxa at species rank except for the genera Taraxacum, Sorbus, Rubus, Pilosella, Hieracium and Oenothera. Species within these genera are challenging even for experts to identify with certainty, and therefore it is exceptionally difficult to acquire trustworthy training data required to develop accurate deeper within-app resolution.
When evolving our taxonomy towards a more holistic flora covering species around the world, including those occurring in parks and gardens, we migrated to the Catalogue of Life (CoL) while maintaining the non-resolved genera as discussed above. Relying on CoL allows us to easily include new species and species groups. Furthermore, it simplifies a future data exchange with biodiversity platforms like the Global Biodiversity Information Facility (GBIF) that also derive their taxonomy from CoL. In the future, we aim to achieve greater taxonomic resolution of the non-resolved genera by engaging experts to contribute trustworthy observations of the respective species.

| The Flora Incognita ecosystem
We designed the Flora Incognita system as a flexible client-server solution consisting of scalable micro-services running in our data centre and client applications making the identification service accessible in different usage scenarios. Conceptually, the server side consists of an observation service, an identification service and a training service (cp. Figure 1). The observation service handles user-generated observation records by storing them in a repository, making them available across a user's different devices and providing them for retraining of the identification system. The identification service realizes interactive species identification as used by our Flora Incognita app as well as batch-wise identification for complete observations already containing all data. The training services continuously retrain our identification system once a significant amount of new and manually reviewed observation data are available. Our multi-platform client software ecosystem currently consists of three apps freely available for Android, iOS and Harmony OS, and is developed using open-source web application frameworks which enable us to maintain a modular codebase that ensures maximum reuse and consistency of functionality across applications. Our Flora Incognita app provides an interactive process that adaptively guides a user to a desired identification. In contrast, our Flora Capture app (Boho et al., 2020) encourages users to take multi-image observations which are batch-wise identified upon sync to our server, providing a digital herbarium. Finally, our Flora Expert app allows users to review observations and is currently only available to invited botanists.
Additionally, we provide an application programming interface (API) for registered external clients that allows other apps and services to incorporate our species identification.

| Application details
The Flora Incognita system currently allows users to automatically identify 4,851 vascular plant species. The app was launched in April 2018 and is freely available in 19 languages for Android, iOS and Harmony OS devices. It has been installed more than 2.75 million times around the globe. The Flora Incognita app consists of four main usage scenarios accessible from the app's home screen (cp. Figure 2):

| Identify plant
The app guides users through an adaptive process of taking one or more images depicting specific organs of an unknown plant, such as a flower or a leaf. Which images are requested and in which order is determined automatically based on an observation's context, that is, the growth form of the unknown plant, the current season and already acquired information. The identification process requires an internet connection for transferring images and metadata to the server and receiving results. In areas without network coverage, users can take images of an unknown plant with the device's camera and later import them into the identification process, in addition to all important metadata pertaining to that image (i.e. date and location). All images are analysed using a cascade of deep neural networks (cp. Figure 1) on the Flora Incognita computer cluster. The app will either suggest a single plant species or a short list of similar species ranked by the identification probability. For each species, a comprehensive fact sheet and informative images depicting different perspectives and organs are provided. Users are requested to confirm the correct species at the end of the process to commit an observation.

| My observations
After the user has confirmed the observation during the identifica-

| Species list
The species list shows all species currently identifiable with the system and can be searched by scientific name, common name, genus and life form or a combination thereof. For each species, a comprehensive fact sheet provides information about its characteristics, ecology, toxicity and status. These fact sheets are automatically updated to the user's current location meaning that they include ecological, protection and distribution information relevant at this position.
Additionally, the fact sheets contain links to national floristic websites providing more in-depth information that goes beyond the average user interest. While we aim for high-quality fact sheets across all supported languages and for a global geographic range, not all information is available yet. We are continuously adding content and have designed a solution that facilitates complete flexibility. For example, we are collaborating with KOSMOS, a publisher of widely renowned analogue field guides, to integrate their species fact sheets for customers who purchased field guides from the 'Was blüht denn da?' series (Spohn et al., 2020). Future collaborations include the development of children-specific fact sheets. Where multiple fact sheets are available, the user can select the desired one via the app's settings.

| News
In the news blog, we discuss app-related topics, for example, how to take pictures most suitable for automated identification and keep users informed with frequently updated stories about topics related to plant diversity.

| Identification accuracy
We use a benchmark holdout dataset of non-trained images to continuously evaluate our solution. Our most recent classifier alone identifies the 4,851 supported plant species with a taxon-averaged accuracy of 83% based on single images. Furthermore, we studied the overall performance when acquiring, where necessary, multiple images and analysing location and time of an observation. We asked two expert botanists to manually check the same 1,000 randomly drawn real-user observations from which they felt able to assess 847 based on the available images. They found that 93% (787) of the observations were correctly and 7% (60) incorrectly identified by our app. Furthermore, they found that the majority of the 7% confused observations were cultivated relatives mixed up with the wild living species supported by our app. We are continuously expanding the set of supported species to overcome these limitations. Table 1 compares Flora Incognita with two other free research-grade plant identification apps. All three are developed within a scientific con- image observations (Jones, 2020;Shapovalov et al., 2020), unavailable or wrong geolocation preventing habitat analysis (Jones, 2020) and identifying non-supported taxa (Jones, 2020;Schmidt & Steinecke, 2019). However, Jones (2020) still concludes that the Flora Incognita app is a very valuable tool even for botanists and ecologists during field studies.

| D ISCUSS I ON
Flora Incognita can help to detect important biological indicators for local environmental changes by providing a spatially and temporally referenced series of species occurrences. When developing Flora Incognita, we found that there is a great need for and interest in better technology to acquire biodiversity data by citizens, professional scientists and educational practitioners. Future sources of monitoring data will include semi-automatically and automatically captured data covering large spatial scales. Since species records are obtained in return for identifying plants that Flora Incognita's users are interested in, that is, crowdsourced, the collected biodiversity data are opportunistic and potential biases need to be considered for analysis. This user focus is reflected in the recorded plant species, that is, we found frequency of a species' observations to be related to user bias, with ubiquitous and conspicuous species being recorded more frequently than rare and cryptic species. Hence, the most frequently observed species represent broadly distributed, often ruderal and nitrophilic species. At the same time, more observations are collected in densely populated areas along roadsides and much less in remote poorly accessible areas away from paths and roads. Despite such peculiarities, the sheer amount of collected plant observations helps overcome these biases when data are appropriately handled in the analysis. Mahecha et al. (2021) show that after accounting for the most prominent biases, Flora Incognita observations from a single vegetation season are already sufficient to reconstruct well-known biogeographical patterns. This is a clear indication that collected observations are meaningful and can indeed support monitoring.
One application scenario for these data is the monitoring of invasive species, which are a major threat to biodiversity associated with high economic cost for our society (estimated at nearly 12 billion Euro per year in Europe; Jeschke et al., 2014;Weber & Gut, 2004).
Invasive species are often conspicuous, attract human interest and can be easily identified by an automated approach. Figure 3a illustrates the potential of our data for assessing spatial occurrence patterns of Bunias orientalis, an invasive neophyte in Germany. The species spreads rapidly and is replacing native and rare plant species from species-rich meadow and semi-dry grassland biotopes.
Early detection and rapid response are critical processes to prevent the spread and establishment of such invasive species (European Union, 2014). The Flora Incognita system provides up-to-date and high-resolution occurrence data. Nature conservation authorities already use these data to quickly initiate control measures of multiple invasive species. Furthermore, Flora Incognita can be used to launch citizen science projects studying research questions focussed on biodiversity in urban and agricultural areas. The strength of such a system lies in producing long-term data records covering a large variety of species in high spatial dimension.
Another application scenario is the monitoring of phenology, an important bioindicator for climate change. Barve et al. (2020) found  August et al. (2020). **We solely report Google Play metrics as of January 2021 since Apple's AppStore only reports ratings per local store and does not publicly show installs per app. ***Uses single image observations (captured from a computer screen (Jones, 2020)) to gain a repeatable experimental set-up but thereby neglecting our multi-image and context analyses that have been demonstrated to substantially improve identification accuracy

ACK N OWLED G EM ENTS
Foremost, the authors thank all the users of their apps Flora Incognita and Flora Capture. Plant observations and user feedback helped them to steadily improve Flora Incognita in multiple ways.
Further, they thank all their student helpers for their help in various ways. They also thank the multitude of volunteering translators helping them to make this app available in 18 languages. Finally, they thank the handling editor and two anonymous reviewers for their valuable comments which greatly improved the manuscript.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.