WhoseEgg: classification software for invasive carp eggs

The collection of fish eggs is a commonly used technique for monitoring invasive carp. Genetic identification is the most trusted method for identifying fish eggs but is expensive and slow. Recent work suggests random forest models could provide an inexpensive method for identifying invasive carp eggs based on morphometric egg characteristics. While random forests provide accurate predictions, they do not produce a simple formula for obtaining new predictions. Instead, individuals must have knowledge of the R coding language, limiting the individuals who can use the random forests for resource management. We present WhoseEgg: a web-based point-and-click application that allows non-R users to access random forests via a point and click interface to rapidly identify fish eggs with an objective of detecting invasive carp (Bighead, Grass, and Silver Carp) in the Upper Mississippi River basin. This article provides an overview of WhoseEgg, an example application, and future research directions.


Identify the Problematic Eggs
We discovered that the manner in which the larval lengths were measured given the egg stage appear to differ between 2014-2015 and 2016. This sections looks more into this and saves a file with all of the possibly problematic eggs.

Comments from Mike About Problematic Eggs
• Egg stages from the problematic eggs all look okay • For the eggs in stages other than 7 and 8, we could change the larval lengths to 0 to match Carlos's approach • For the eggs in stages 7-8, he can't get the larval lengths • Should we take all of these eggs out?
• Should we remove only the eggs in stages 7 and 8 with non-zero larval lengths • Try fitting models and see how it affects the results from the various options • Mike's email with updated eggs: "A lot of the rest I looked at I changed to stage 7, so the length was valid. I put a few comments in the column at the end." Updated data is in the file "2016 eggs with incorrect data_MW.csv"

Create Data with All Problem Observation Removed
Identify the eggs from 2016 with non-zero larval lengths with an egg stage other than 7 or 8 and the eggs from 2016 with larval lengths of zero with an egg stage of 7 or 8:

Create Data with Egg Stages Corrected by Mike and Unknown Observations Removed
There are still some eggs in the data in stages 7 and 8 with a larval length of 0. Mike was not able to fix the larval length for these eggs (due to eggs already having been destroyed and the pictures not providing good angles for measurement): dim(eggdata_corrected %>% filter(Egg_Stage %in% c(7,8), Larval_Length == 0))[1]

Save the Egg Data for WhoseEgg
Since the metric results were not very different across the models trained on the four datasets, we decided to use the dataset that is as accurate as possible: "corrected_and_removed". That is, the data includes the corrections made by Mike, and the observations that are in egg stage 7 or 8 with a larval length of 0 have been removed.