Pyafscgap.org: Open source multi-modal Python-based tools for NOAA AFSC RACE GAP

NOAA AFSC’s Groundfish Assessment Program produces longitudinal catch data which support ocean health research and fisheries management (Fisheries, n.d.). These “hauls” report in what quantities and locations bottom trawl surveys find different marine species along with environmental conditions at the time and place of observation (Heifetz, 2002). Increasing usability for communities of diverse programming experience, Pyafscgap.org offers query language compilation, memory-efficient algorithms for “zero-catch” inference, and interactive visual analytics for these economically and scientifically important GAP datasets. Altogether, this research toolset supports investigatory tasks across survey programs’ locations and broadens access through game and information design.


Summary
NOAA AFSC's Groundfish Assessment Program produces longitudinal catch data which support ocean health research and fisheries management (Fisheries, n.d.). These "hauls" report in what quantities and locations bottom trawl surveys find different marine species along with environmental conditions at the time and place of observation (Heifetz, 2002). Increasing usability for communities of diverse programming experience, Pyafscgap.org offers query language compilation, memory-efficient algorithms for "zero-catch" inference, and interactive visual analytics for these economically and scientifically important GAP datasets. Altogether, this research toolset supports investigatory tasks across survey programs' locations and broadens access through game and information design.

Statement of need
Pyafscgap.org reduces barriers for use of NOAA AFSC RACE GAP 1 data, offering: • Improved developer usability. • Memory-efficient algorithms for zero catch inference. • Zero-code visualization tools.
Altogether, these open source tools extend the reach and approachability of GAP's multiple survey programs to support analysis like longitudinal catch per unit effort (CPUE) in context of environmental changes (Pottinger, 2023b).

Developer usability
Working with these data requires knowledge of tools outside the Python "standard toolset" like closed-source ORDS query language ("Oracle Rest Data Services," 2022). While the afscgap package offers easier access to the official REST service, it also crucially offers ORDS compilation, documented types, and lazy access to these large datasets. Together, these tools enable Python developers to efficiently use familiar patterns to interact with these data: type checking, standard documentation, and compatibility with common Python data-related libraries.

Record inference
Surveys on their own within the API struggle supporting some investigations as they provide "presence-only" data (Kenney & Roberson, 2022). For example, the API may readily yield total mass of Pacific cod but not its geohash-aggregated CPUE (Niemeyer, 2008). = Metrics like CPUE need "absence data" or hauls in which the species was not recorded. This package can efficiently infer those results (Pottinger, 2023b).

Broad accessibility
Though the afscgap Python package makes GAP catch information more accessible, the data's size and complexity complicates comparative analysis between species, years, and/or geographic areas (Pottinger, 2023b). Without deep developer experience, it may still be difficult to get started even with scientific background. To address a broader audience, this project offers visualization on top of afscgap with CSV and Python code export as a bridge to further analysis.

Functions
This project improves accessibility of GAP data and offers approachable tools to kickstart analysis.

Efficient facade
The afscgap library manages significant complexity to offer a simple familiar interface to Python developers: • Lazy "generator iterables" increase accessibility by encapsulating logic for memoryefficient pagination and "data munging" behind Python-standard iterators (Hunner, 2019). • Decorators adapt diverse structures to common interfaces in zero catch data, offering polymorphism that helps to reduce the complexity of code using the library (Shvets, 2023a). • Providing a single object entry-point into the library, a "facade" frees users from needing deep understanding of the library's types and transparently compiles "standard" Python types to Oracle REST Data Service queries (Shvets, 2023b).

Zero catch inference
"Zero catch" inference enables a broader range of analysis with the following algorithm: • Lazily paginate while records remain available from the API service.
-Record species and hauls observed from API-returned results.
-Return records as available. • Lazily generate inferred records after API exhaustion.
-For each species observed in API results, check if it had a record for each haul in a hauls flat file (Pottinger, 2023c). -For any hauls without the species, produce a record from the iterator.
Note afscgap performs Python-emulation of ORDS filters on inferred records.

Visualization
These complex data require technical sophistication to navigate and, to increase accessibility, visualization tools help start temporal, spatial, and species comparisons with deep linking, coordinated highlighting, separated color channels, summary statistics, and side-by-side display (Few, 2010). To support learning this UI, an optional introduction sequence tutorializes a "real" analysis via Hayashida design (Brown, 2015;Nutt & Hayashida, 2012): • Introduction: The tool shows information about Pacific cod with pre-filled controls used to achieve that analysis gradually fading in, asking the user for minor modifications. • Development: Using the mechanics introduced moments prior, the tool invites the user to change the analysis to compare different regions. • Twist: Enabling overlays on the same display, the user leverages mechanics they just exercised in a now more complex interface. • Conclusion: The visualization invites the user to demonstrate skills acquired in a new problem.
This visualization also serves as a starting point for continued analysis by generating either CSV or Python code to take work into other tools.
In addition to use in a graduate classroom setting, five individuals with relevant background offered feedback on this open source visualization with four aided by a think-aloud prompt 2 (Lewis, 1982).

Limitations
As further documented in the repository (Pottinger, 2023a), these tools: • Run single-threaded and synchronous.
• Aggregate hauls as points in visualization due to data limitation.
• Ignore hauls if entirely excluded by NOAA.