ESP3: An open-source software for the quantitative processing of hydro-acoustic data

ESP3 is an open-source software to process single-beam and split-beam echosounder data. Multiple displays, analysis tools parameterizable algorithms are available to the user to scrutinise their data, and a scripting module allows applying these to entire surveys in batch processing. The software infrastructure is designed to handle large datasets with efficiency and consistency. With ESP3, one can implement robust workflows combining automated methods and expert decision-making to produce quantitative analysis of acoustic backscatter. While originally designed to process acoustic surveys for fish biomass estimation, ESP3 has also been used for studies of marine ecosystems and marine geophysical applications


Motivation and significance
Acoustics are a standard technique to assess the distribution and abundance of fish and zooplankton [1]. Active acoustic instruments such as single-beam and split-beam echosounders produce a view of a vertical slice of the water column which can be used to characterise its biological content. These systems are now found on most research and fishing vessels, yet there are relatively few software packages available to process their data for quantitative analysis of acoustic backscatter (see Table 1). Those distributed under an open-source licence lack a fully developed user interface could that allow easy data scrutiny and processing by users unfamiliar with a coding environment. ESP3 (Echo Sounder Package) is the third iteration of a software written and maintained at the National Institute of Water and Atmospheric Research (NIWA) to process fisheries acoustics data (see Fig. 1). Like its previous versions, ESP3 was designed around the need to run processing scripts i n order to fulfil fisheries research requirements of reproducibility and consistency. Under continuous and active development, it has since evolved into an open-source platform for extracting quantitative measurements from echosounder data, irrespective of the field of application. The open-source approach guarantees transparency, which enables the comparison of quantitative studies across institutes.
ESP3 is used primarily to process acoustic data from dedicated fisheries surveys (e.g., [2][3][4]), following the well-established methodology of ''echo-integration'', which allows the estimation of fish biomass from the computed area backscattering coefficient S a ( m 2 m −2 ) [1]. The software includes several algorithms replicating methodology from relevant literature and others developed at NIWA (see Table 2) to operate a range of tasks, such as excluding common artefacts (e.g. signal drop-out, interferences) and designating specific portions of the water column for analysis.

Typical workflow
In ESP3, processing data from a hydro-acoustic survey usually follows a standard workflow (see Fig. 2). First, the user needs to populate the files' metadata from the survey design (i.e. snapshot number, stratum name, transect type, and transect number). Next, data are pre-processed in a manual, semi-or fullyautomated fashion, depending on the volume of data, complexity of the analysis, and the need for expert input. This process typically includes denoising, defining the regions of interest, and defining the samples to exclude from analysis (e.g. bottom echo, bad pings, noise spikes, etc.). XML scripts detailing the parameters and inputs for the echo-integration can then be generated from the software, and edited by the user if necessary. The use of scripts ensures repeatability and reproducibility of the echo-integration results. Batch processing with scripts creates standardised output in self-descriptive .csv and .xlsx files, which can then be imported into external packages or software for fisheries stock assessment (e.g., [5,6]).

Software description
ESP3 provides a graphical user interface allowing the user to visualise and process hydro-acoustic data from common splitbeam and single-beam echosounders. Supported files formats include Simrad *.raw files, ASL *.01A files, and Furuno FCV-30 *.dat files. In this context, hydro-acoustic data are defined as a series of ''pings'' which are short temporal quadrature signals (IQ) received from an instrument in response to a known transmitted acoustic pulse, after propagation through the water column and backscattering from reflectors situated in the water column and the seafloor. Each short temporal time series finishes when the next pulse is transmitted, and the accumulations of consecutive time series are called an ''echogram''. Initially, the IQ time series need to be converted to physically meaningful (acoustic) quantities as described in [1,7,8], and geographically referenced. These samples are then associated to a time and a specific geographical location resulting in a time series S (t, lat, lon, depth) for all physical quantities defined. In this paper, data cleaning refers to the process of excluding portion of the signal from further analysis, either as by defining them as empty water (0) or NaN (''Not a Number''). Data annotating refers to the process of defining regions of interest in an echogram and attributing it to a specified class.

Software installation
ESP3 is written in MATLAB R ⃝ . With the source code, it can be run from a standard MATLAB environment provided the appropriate version and toolboxes are installed. The source code is under Git version control. Periodically, a compiled version is created out of the latest stable release. This is an individual ESP3 application for Windows (64 bits platforms) that can be run without the MATLAB software or licence and only requires the user to install the appropriate version of the (free) MATLAB Compiler Runtime. The source code version and the compiled version operate in the same manner except for the installation and software start procedure.

Software architecture
The software is designed to make full use of MATLAB's objectoriented capabilities, relying on some of MATLAB's built-in graphical objects and defining new custom classes to fit the context of hydro-acoustic data (Fig. 3). Starting the software creates a single instance of the esp3_cl class, the properties of which include the software's main figure. When the user imports a data file, one or several instances of the layer_cl class are created and added as properties of the esp3_cl instance, and the data are parsed to create further class instances as properties. Crucially for the survey processing workflow, data files that have been allocated the same metadata are stored in the same layer_cl object when loaded so that they can easily be displayed and processed together. Since hydro-acoustic data files often contains data coming from several channels with different operating frequency, several transceiver_cl are created for each layer. The voluminous parts of the data (i.e. full time series of acoustic signals) are stored in memory-mapped binary files (using MATLAB memmapfile objects encapsulated in an ac_data_cl object), allowing for the rapid loading of large datasets of several terabytes without filling the system's RAM. The result of the cleaning and annotating process for each channel are stored in instances of the region_cl and bottom_cl classes, which are added as properties of the relevant transceiver_cl object.

Graphical interface
ESP3's graphical interface provides interactive and intuitive access to all available tools and algorithms to help the user scrutinise the data. The graphical interface is arranged into three main panels: the Control Panel, the Algorithms Panel, and the Main Panel (Fig. 1).
The Control Panel consists of a series of tabs that provide access to different data types and channels, display settings, an overview of the echogram shown on the Main Panel, functionalities for data management, map (geographical) display, data processing options, the calibration tool, listing and handling of regions, multi-frequency analysis and display, single target and  tracked targets detection results, lines import/export options, and definition of environmental parameters that determine the sound-speed and absorption coefficients.
On the Algorithms Panel, the parameters of the algorithms used for data scrutiny or analysis are defined. From these tabs, the algorithms can be directly applied on the data currently displayed on the main panel. They can also be applied to a batch of layers from the processing tab on the Control Panel.
The Main Panel displays the acoustic data ( Fig. 1) and is used to: (i) define (annotate) regions (information stored in a re-gion_cl object) around marks of interests such as fish schools; (ii) define (clean) areas, pings or samples to be removed from analysis, such as the data below the bottom represented as a black line (information stored in a bottom_cl object); (iii) inspect the result of algorithms applied to the data in order to fine tune their parameters. From the Main Panel, various tasks can be invoked (e.g. application of algorithms, metadata editing, data exploration, and data exports) through the use of contextual menus that appear when right-clicking on the relevant objects in the display. This configuration is complemented at the top with a Menu Bar and a Tools Bar, allowing quick access to tools and functionalities, and at the bottom with a Metadata Panel, which updates interactively and provide information on the currently displayed file, sample where the mouse cursor currently points at, and current ESP3 processing.
Echograms of all available channels (information stored in a transceiver_cl object) in the data file (information stored in a layer_cl object) are displayed in a separate window titled ''All Channels'' (Fig. 1).

Software functionalities
ESP3 offers many tools to visualise and scrutinise the data, including display settings and tools to navigate through the data rapidly, and additional windows and displays to provide further  Annotate [17] Bad pings detection Detection of pings corrupted (multiple criteria available), to be removed from further analysis.

Clean ESP3 documentation
Spikes detection Detection of samples contaminated by short bursts of noise (usually due to interferences), to be removed from further analysis.

Clean Undocumented
Dropouts detection Detection of pings experiencing a drop in signal level, to be removed from further analysis.

Clean Undocumented
Denoising Removal of background noise and estimation of signal-to-noise ratio.
Clean [18] School detection Implementation of the shoal analysis and patch estimation system algorithm.
Annotate [19] School classification Classification of regions or integration cells based on a user-defined classification tree.

Annotate Undocumented
Single target detection Detection of isolated targets (usually, single fish), based on signal characteristics.
Annotate [20] Single target tracking Tracking of single targets in space and time. Annotate [21,22] information, such as a map to provide the spatial context of the experiment/survey. In the following sections, we describe some of the more advanced tools for scrutinising and processing the data.

Metadata management
For each folder containing hydro-acoustic data files, an SQLite database is automatically generated when a file is opened for the first time to hold the metadata for each file contained in the folder. Some attributes are automatically populated using information rapidly obtainable from each file (e.g. Filename, StartTime, EndTime in the logbook table) (database structure described in Table 3). Additional attributes are populated automatically when the corresponding files are opened in ESP3 (e.g. attributes of the ping_data table). Other attributes can only be populated by the user via the ESP3 interface (e.g. attributes of the survey table). The attributes in the survey table are applicable to all files in the folder, so the user is requested to populate these only once. For each file, the user can fill the Snapshot, Stratum, Type, and Transect attributes of the logbook table to reflect the experimental design of the survey.

Environmental and calibration tools
To allow quantitative analysis of hydro-acoustic data, it is necessary to compute accurate calibration values for an acoustic instrument [23]. The appropriate environmental parameters (i.e. temperature and salinity) at the time of the calibration must be used to accurately estimate the water's physical characteristics that affect the signal level (absorption and sound speed) [24][25][26]. ESP3 includes a calibration tool to compute and save calibration values from target sphere measurements. Calibration values and environmental parameters can also be imported and applied via the Calibration and Environment tabs, respectively, of the interface.

Manual cleaning and annotating tools
ESP3 offers tools to interactively define: • ''Bad Pings'': pings to be ignored in further analysis (treated as NaN).
• The ''Bottom Line'': the sample in each ping under which all samples are to be ignored, usually including the seafloor echo (data below the Bottom Line treated as NaN) • ''Regions'': areas of interest, tagged either as: • Data: region to be annotated and included in the analysis, or • Bad Data: region to be excluded from analysis (treated as empty water) Data cleaning and annotating do not modify acoustic data stored in ac_data_cl objects. Instead, it stores the information on the ''status'' of the data either in its properties or a region_cl or a bottom_cl object. Those objects can be saved externally as XML files, which will be automatically reloaded the next time the same layers are opened. These objects can also be archived in a SQLite database to allow the user to store and reuse multiple versions of cleaning and annotating, for example if the user wishes to backup and safeguard the results of an annotation session, or if different annotations are required for different data uses.

Table 3
Structure of the SQLite database automatically generated.

Algorithms
Algorithms (Table 2) allow users to automatically clean and annotate the data before scrutinising it manually. Each algorithm is controlled by several editable parameters allowing the user to test and fine-tune the algorithm for a specific case. Once appropriate parameters have been found, these settings can be saved (stored in the .\config folder of the installation directory) for future use. Most algorithms can be applied at various scales: on a subset of the currently displayed layer (via a region or a selection box), on the currently displayed layer (via the Algorithms Panel, or the Processing tab of the Control Panel), or on all currently loaded layers or files (as before, or via a script as described below).

Automated processing using scripting
The repeatability of results relies on tracking and controlling the processing that has been applied to a set of files. In ESP3, this is achieved with echo-integration scripts: XML files that specify the processing to be applied. Echo-integration scripts can be written by the user manually, or created using the ESP3 scriptbuilder tool (in the Scripting menu). Running a script creates a survey_input_cl object (see Fig. 3), which applies the analysis to the main esp3_cl object, populates the results in a sur-vey_output_cl object, and exports it into .csv and .xlsx files readable by other packages.

Illustrative examples
We illustrate the use of ESP3 by echo-integrating one snapshot (i.e. a set of random transects covering the survey area) of the acoustic survey of spawning southern blue whiting (SBW) (Micromesistius australis) on the Campbell Island Rise, New Zealand, which took place from 28 August to 25 September 2019 aboard RV Tangaroa (NIWA voyage code TAN1905). The survey aimed at estimating the relative abundance of adult SBW and predicting  pre-recruit numbers into the stock, which is used to inform decisions made by the New Zealand government to set commercial catch quota. The survey covers a wide area (30,000 km 2 ) and is divided into strata, with a total of 54 random transects for this snapshot.
The hydro-acoustic data were first cleaned in a semiautomated fashion (i.e. fine-tuning and applying bottom detection and bad pings detection algorithms, then visually inspecting the results and editing if necessary). Then, the data were manually annotated to define SBW marks. The results of this cleaning and annotating were then saved and a script was generated (see Fig. 4) to run the echo-integration.
Results from the echo-integration can be visualised within the software as a distribution map or graph of relative abundance to check for possible issues before further analysis (Fig. 5). Integrated backscatter can then be converted to biomass using the ratio of mean weight to mean-backscattering cross-section for SBW (computed from length-frequency samples from trawl catches during the survey).

Impact
ESP3 allows researchers to scrutinise and process echosounder data collected from research and commercial vessels. By providing the only open-source software with a graphical interface, ESP3 significantly increases the uptake and potential uses of those datasets. Likewise, it provides a cost-effective alternative to proprietary software for less affluent institutions interested in learning and using hydro-acoustics. ESP3 was first released to the public in March 2017 and has already generated considerable interest. At the time of writing this article, it has been downloaded more than 1600 times from 63 countries. 1 Multiple research institutions and universities have contacted the development team for assistance or suggesting modifications. The first training course on the use of ESP3 held at the annual meeting of the International Council for the Exploration of the Sea (ICES) Working Group on Fisheries Acoustics, Science and Technology (WGFAST), was attended by researchers and students from seven countries. At NIWA, ESP3 is used to process all hydro-acoustic data from fisheries acoustics surveys, supporting a broad range of research in fisheries and ecology (e.g. [27][28][29]). It is also actively used in geophysics to cross-calibrate seafloor backscatter measurement from multibeam echosounders (in a habitat mapping context [30]), and to compute bubble size distribution from seeps using broadband measurements to estimates volumes of CH 4 and CO 2 released from the seafloor [31].

Conclusions
ESP3 is an open-source tool for the quantitative processing of echosounder data. The software is under continuous and active development. New functionalities are frequently added to meet further research requirements and the source code is regularly improved to accelerate workflows and handle datasets of everincreasing size. In recent years, fisheries acoustics manufacturers have developed new broadband systems (e.g. SIMRAD EK80) that generate a dramatically larger volume of data, forcing software developers to optimise processing workflows and build better supervised or non-supervised methods for processing datasets. The development of ESP3 has followed this trend, resulting in a software that can successfully handle very large datasets and apply complex algorithms efficiently. Planned future developments include machine learning approaches for data processing to further speed up, automate and standardise the cleaning and annotating process, using the large datasets already processed with ESP3 and stored at NIWA for the training and validation of such algorithms. 1 https://sourceforge.net/projects/esp3/.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.