The CALFISH database: A century of California ’ s non-confidential fisheries landings and participation data

to 2019; (2) annual number of commercial fishing vessels by length class from 1934 to 2020; (3) annual number of licensed commercial fishers by area of residence from 1916 to 2020; and (4) annual number of party boat (CPFV) vessels, anglers, and their total catch by species from 1936 to 2020. Notably, we harmonized port names, species common names, and species scientific names across all years and datasets. We make these curated datasets, collectively called the CALFISH database, publicly available to any interested stakeholder in the supplementary materials of this paper, on an open-access data-repository, and in the wcfish R package. These datasets can be used (1) to understand the historical context of California ’ s fisheries; (2) for original research requiring only summaries of historical landings and participation data; and (3) to anticipate the likely characteristics of confidential data requested from the state. We conclude the paper by identifying key principles for increasing the accessibility and utility of historical fisheries landings and participation data


Introduction
California's seafood industry supports vibrant coastal economies encompassing harvesters (fishers and farmers), processors, distributors, importers, restaurants, tourism, and retail.In 2016, California's commercial seafood industry generated more jobs, income, and sales than any other U.S. state and its recreational fisheries generated the second largest economic and employment impacts after Florida (NMFS, 2018).When excluding the import industry, California's commercial fisheries generated 14,900 jobs representing ~4100 harvesters, ~1700 processors, ~600 distributors, and ~ 8500 retailers (NMFS, 2018).Its recreational fisheries supported an additional ~17,000 jobs (NMFS, 2018) resulting from both trip expenditures (i.e., costs of fishing from for-hire vessels, private boats, or shore, including fuel, bait, ice, and charter/guide fees) and durable expenditures (i.e., costs of equipment used for fishing).California's fisheries also provide sustainable, nutritious, and often affordable food to both local consumers (Quimby et al., 2020) and regional to global markets (CDFW, 2015).Maintaining California's fishing communities into the future depends on a detailed understanding of their past.First, assessing the status of fisheries through stock assessment depends on time series of historical catch (Mason, 2010).Second, enhancing the resilience of social-ecological fisheries systems requires understanding the dynamics and consequences of historical environmental, economic, and regulatory shocks.For example, climate change increasingly threatens California's fisheries (Chavez et al., 2017) and understanding the impacts of historical environmental change on the distribution, production, and composition of California's fisheries landings (Selden et al., 2020) is crucial to preparing fisheries science, management, and industries for the future (Chavez et al., 2017).Similarly, increasing the resilience of California's fisheries to market shocks caused by trade wars or a global pandemic requires understanding the consequences and adaptive responses of past shocks (Gephart et al., 2019;White et al., 2021).Lastly, the implementation of new management measuressuch as marine protected areas, catch shares, or flexible permitsmust consider the impact of historical regulations on both fisher behavior and resource dynamics (Hackett et al., 2015;Kuriyama et al., 2019;Warlick et al., 2018).Thus, access to historical fisheries landings and participation data is critical for both understanding the past and preparing for the future.
The California Department of Fish and Wildlife (CDFW) has been collecting information on California's fisheries since 1916.Commercial fisheries catch is monitored using the landing receipts ("fish tickets") collected and submitted by fish buyers and processors.These receipts report the species, weight, and price of the purchased landings and information on the location of the catch and gear used in capture.Additionally, port samplers often collect information on the species, age, size, sex, and maturity composition of a sample of the landed catch.Landings from recreational fisheries come through a variety of modes (e.g., party boats, private boats, shore, piers, jetties) and are monitored using a variety of instruments.Landings from vessels that take paying customers fishing -Commercial Passenger Fishing Vessels (CPFVs) -represent the longest and best monitored mode in the recreational sector (Hill and Barnes, 1998;Hill and Schneider, 1999).CPFVs are required to submit logbooks reporting the number of passengers, number of hours fished, location of fishing, and number of fish retained and discarded per trip.Additionally, port samplers and onboard observers often collect information on the species, size, and sex composition of a sample of CPFV landings.Although recreational fishing from private boats, piers, jetties, and shore is more challenging to monitor due to its dispersed nature and private-only access in some cases (e.g., private docks or marinas), landings and discards from these modes were quantified by the Marine Recreational Fisheries Statistical Survey (MRFSS; (Hicks et al., 1999)) from 1979 to 2003 and have been quantified by the California Recreational Fisheries Survey (CRFS) since 2004 (CDFW, 2017).In general, this monitoring employs intercept surveys in which samplers interview fishers and identify, measure, and weigh their catch, and telephone surveys (to a lesser extent) to scale the effort of the sampled population to the entire fishery.
CDFW makes its fisheries data available to the public through several pathways, which vary in accessibility and extent.First, data can be released through a public data request, although confidential data (data pertaining to ≤3 fishers, vessels, or businesses) will not be available without sufficient justification and a binding data sharing agreement.These requests take time (weeks to months) and resources to process and may not be necessary for analyses in which only non-confidential data summaries (data pertaining to >3 fishers, vessels, or businesses) are needed.Second, CDFW has published non-confidential summaries of its fisheries data in quasi-annual reports dating back to 1929 (CDFW, 1929).While these reports present a rich history of landings and participation in California's fisheries, the data are spread throughout 1000s of tables in 100s of documents, severely limiting their accessibility to researchers, fishers, and other interested stakeholders.Finally, CDFW and its partner agencies in Oregon and Washington submit detailed data to the online PacFIN (PSMFC, 2021) and RecFIN (PSMFC, 2016) databases, which generate and publish publicly-available, nonconfidential summaries of commercial and recreational fisheries data, respectively.Although these databases make the data available in machine-readable formats (e.g., CSVs or tab-delimited text files), the length and resolution of this data is often more limited than that published in the CDFW landings reports.For example, the PacFIN data begins in 1980 whereas much CDFW data begins in 1928.Similarly, the PacFIN data are often summarized by port complex while the CDFW data are often summarized by individual ports.On the other hand, the RecFIN data are considerably more detailed than the CDFW data, but generally begin in the early 2000s whereas the more generalized CDFW data begin in 1936.
To improve the accessibility of California's non-confidential fisheries data, we digitized data published in the CDFW landings reports and made these data available in clean, documented, and machine-readable formats to any interested user.We reviewed the 58 landing series reports published by CDFW from 1928 to 2020 and extracted and curated 13 datasets with long time series and wide public interest.In general, these datasets describe landings and participation in commercial fishing and the CPFV sector of recreational fishing (i.e., recreational fishing from private boats and shore are not described in these reports).We rigorously quality controlled all of the extracted data and enhanced the datasets with additional attributes of interest where possible.Notably, these enhancements included harmonizing common names across years and datasets and linking common names with scientific names.We make these datasets, collectively called the CALFISH database, publicly available to any interested stakeholder in the supplementary materials of this paper, in an open-access data-repository, and in the wcfish R package.We hope that these datasets will be used (1) to understand the historical context of California's fisheries; (2) for original research requiring only summaries of historical landings and participation data; and (3) to anticipate the likely characteristics of confidential data requested from the state.

Data sources
The California Department of Fish and Wildlife (CDFW) began collecting fisheries landings and participation data in 1916 and began publishing non-confidential summaries of these data in 1929 with "Fish Bulletin 15.The Commercial Fish Catch of California for the Years 1926and 1927" (CDFW, 1929).The first 38 publications in the commercial fisheries landings series, which present data from 1926 to 1999, were published in the Fish Bulletin.In 2001, the UCSD Scripps Institution of Oceanography Library undertook an enormous effort to scan these publications and provide them as PDFs in their open-access digital library collection (UCSD, 2022).Since 2000, the fisheries landings series has been published on the CDFW "Final California Commercial Landings" website (CDFW, 2020).Throughout this paper, we distinguish between these two sets of publications as the Fish Bulletins (FB) and the website-hosted landings series, respectively.We reviewed these 58 publications (Table 1) and extracted and curated 13 datasets with long time series and wide public interest (Fig. 1).We note that we did not digitize every table presented in the reports and that additional datasets could still be assembled (Table S1).In general, we did not digitize tables that contained sensitive information, exhibited limited temporal coverage, were highly incomplete due to either voluntary reporting or confidential redactions, or represented highly complex digitization efforts that already overlap with the digitization efforts of Mason (2004) and Norton (2015) (see Table S1 for details).

Data collection, quality control, and enrichment
We extracted data of interest using a variety of character recognition and data extraction tools including ABBYY Finereader for Mac (ABBYY Production LLC, 2013), Tabula (Aristarán et al., 2020), and the tabulizer R package (Leeper et al., 2018).The proprietary ABBYY software generally produced better transcriptions than its open-source alternatives, especially for complex or low quality tables; all three methods were more efficient than manual transcription.However, both character recognition and data extraction are imprecise processes: transcription errors were common and we rigorously quality controlled all of the extracted string and numeric data.We quality controlled string data (e.g., port names, species names, etc.) through rigorous data inspection, visualization, and harmonization efforts.We quality controlled numeric data (e.g., pounds of landings, value of landings, number of vessels, number of anglers, etc.) by confirming that calculated row and column totals matched reported row and column totals.For example, if a table reported annual landings of a list of species in a port as well as the total annual landings for the port, we compared the computed total to the reported total, and edited transcription errors until the totals matched.If a table did not include row or column totals, we quality controlled the data through repeat visual inspection; fortunately, this was rare and only occurred for brief tables with low numbers of observations (e.g., number of fishers by year).
In many cases, we added attributes to the extracted datasets that were not included in the original data.For example, we added attributes to detail: (1) the source of the data, including the reference name and table number; (2) the port complex for datasets with a port attribute, using the 1987-2019 nine-region typology (Fig. 2; Fig. S1); (3) harmonized common and scientific names; and (4) landings volumes in kilograms (kg) and/or metric tons (mt).We also added attributes to allow the aggregation of values into categories that consistently occur over long time series.For example, we added a grouping attribute to map finely resolved but inconsistently used vessel length classes into wider length classes that occur over the whole time series (see Fig. 8).To maximize transparency, we indicate which attributes are native to the original data and which attributes were added by us in the meta-data for each dataset.

Data storage
We published the curated CALFISH datasets in three places to ensure their long-term, open-access availability to any interested researcher, fisher, or other stakeholder.First, the datasets are published as Excel files in the online supplemental materials of this paper.Second, the datasets are published on the Dryad open-access data repository (Free et al., 2021).Finally, for R programmers, the datasets are published as part of the wcfish R package (Free, 2021).The wcfish package includes other West Coast fisheries datasets as well as an assortment of functions for processing West Coast fisheries data, including functions to harmonize species common names and scientific names.We plan to update the data in the latter two sources every year after the public CDFW landings report is published.

Fig. 2.
California's commercial fishing ports by port complex.Open circles indicate ports with historical landings data but without landings reported since 2010.Port complexes are delineated by county lines (white lines).Catch is reported by commercial fishing block (gray lines).The three largest ports in terms of average annual landings in each port complex are labelled.

Data overview
The landings datasets curated below describe landings in terms of both volume (pounds) and value (dollars).Landings represent retained catch and do not include catch discarded at sea.Landings values reflect nominal ex-vessel values and have not been adjusted for inflation.Landings volumes are reported "without regard to condition" and reflect the volumes reported on the original landing receipt (i.e., they have not been universally converted to round weights).Although most fish and shellfish are landed in round (whole) condition, some species may be eviscerated (gutted), dressed, or beheaded before being brought ashore, but this is not recorded in the data.This is especially common for barracuda, shark, salmon, sablefish, white seabass, and swordfish.A few market categories do include descriptions of condition (i.e., Pacific herring roe, Pacific herring roe on kelp, Chinook/coho salmon roe, spider/sheep crab claws, and crab claws) but there is no guidance on how to interpret these descriptions.We provide an attribute for condition with four optionsroe, roe on kelp, claws, and not specifiedbut Fig. 3.The evolution of market categories reported in the CDFW commercial landings data summaries.The panels show (A) the number of market categories used over time, (B) the proportion of landings occurring within species-specific market categories over time; and (C) an illustrative example of the evolution of rockfish (Sebastes spp.) market categories over time.In general, the taxonomic resolution of landings data has increased over time.However, we caution that even "speciesspecific" market categories can include landings of other similar species.The CALCOM database accounts for this by using port sampling data to disaggregate market categories into species-specific landings; these reconstructed landings data are then provided to PacFIN database and then to the NOAA FOSS database.caution against using these attributions without further clarification from CDFW.
The CDFW datasets report landings by market categories that are not always species specific and that have evolved over time (Fig. 3).Market categories represent the groups used by the fishing industry to sort landings for both reporting and sales and even categories with speciesspecific names may include a small percentage of similar species (e.g., the "Bocaccio rockfish" category may include a few other rockfish species).Furthermore, these market categories are described using common names rather than scientific names.Although a table for relating common and scientific names is provided at the beginning of each Fish Bulletin-hosted landings report, the conventions for common names and alignment with scientific names has varied throughout the landings series.We rigorously harmonized common names across years and datasets and associated common names with updated scientific names with guidance from the Fish Bulletin species tables.To ease analysis, maintain transparency, and allow users to make different decisions regarding species identities, every dataset with species-specific information includes the original common name, the harmonized common name, and the updated scientific name.We also provide a supplemental table for appending additional taxonomic information (i.e., phylogenetic groups and/or commercial categories) to any of the curated datasets.Overall, the landings data include 397 market categories representing 12 phyla, 25 classes, 68 orders, 130 families, and 200 genera.We note that a number of catch reconstruction efforts (Ralston et al., 2010;Shelton et al., 2012) have developed algorithms for disaggregating catch reported in broad market categories into species-specific catch.Although reconstructed catches are not included in the CDFW landings summaries, they are used in the PacFIN and RecFIN databases described in Section 3.5 below.
Finally, many of the datasets published in the landings series describe statistics for individual fishing ports or for groups of fishing ports called "port complexes" (Fig. 2).However, the naming conventions for ports and the delineation of port complexes has varied throughout the landings series.To ease analysis, we harmonized port and port complex attributes across years and datasets.In most cases, harmonizing port names involved straightforward decisions (e.g., "Bay", "Bay (Bodega)", and "Bodega Bay" all refer to Bodega Bay).However, in some cases, nuanced decisions were required.Namely, we decided that references to "Tomales Bay (Marshall)", "Princeton (Half Moon Bay)", and "Point Reyes (Drakes Bay)" imply "Tomales Bay & Marshall", "Princeton & Half Moon Bay", and "Point Reyes & Drakes Bay".This decision was based on the fact that, in some years, statistics are separated for these commonly paired ports. 1 We used slashes to denote grouped ports (e.g., "Tomales Bay/Marshall" indicates both Tomales Bay and Marshall together) in the harmonized port names.We retained the original port name in the curated datasets to make our decisions transparent and to allow users to make different decisions.The geographical delineation of port complexes varied throughout the landings series (Fig. S1) with: 13 complexes defined by county lines in FB 15-44 (1926-1930), 8 complexes defined by natural landmarks in FB 44-49 (1931-1935), 7 complexes defined by county lines in FB 57-173 (1936-1986), and 9 complexes defined by county lines in FB 181 and the website-hosted reports .We used the recent 9-complex typology in the curated datasets but provide a table to summarize data based on the older typologies.This table also includes the coordinates  .Shipments were only reported in FB 57-173 (1936-1986) and were only reported for tuna after 1965.
1 Princeton and Half Moon Bay were separated in 1946 (FB 67 (lat/long) of each port.

Annual commercial landings by source and species, 1936-2019
Annual commercial landings by source (e.g., California waters, other U.S. waters, or foreign waters) and species have been published since FB 57 and generally span 1936-2019 (Figs. 4 & 5) (Figs. 4 & 5).Although speciesspecific totals are available for 1977-1999, they are only presented as total landings and shipments in FB 173 (1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986) and as total landings in FB 181 (1987-1999) (i.e., information on the source of the landings are not provided).In all years, landings were reported as coming from California waters, waters north of the state (i.e., Oregon/ Washington's waters), and waters south of the state (i.e., Mexico's waters or high seas off Mexico).In some years, landings were reported from other foreign waters including waters of the Central Pacific, South Pacific, Japan, and Africa.Although reported, we were unable to digitize data for 1949-50 (FB 80/86) and 1970 (FB 154) because the tables were too blurry.We do not report data for 1984 because the computed and reported totals do not match (FB 173).Data for 1926-1935 could be extracted from the monthly landings reported in FB 15-49 but these tables were too blurry to digitize accurately and efficiently.FB 181 published total landings and shipments from 1916 to 1999 and we used this data to visually fill missing years (Figs. 4 and 5).

Annual commercial landings by port and species, 1941-2019
Annual commercial landings by port and species were published in FB 59-170 (1941-1976, though 1942 was not included) but were not published in FB 173 or 181 .These data were published again in the website-hosted landings series (#60-79; 2000-2019) (Fig. 6).Data were published for the Sacramento Delta region in FB 59-108 (1941-1957) and in the website-hosted landings series (2000-2019) but were not published in FB 63-181 (1958-1999).Species-level landings were reported only in value (dollars) in FB 59-67 (1941-1946) but were reported in both volume (pounds) and value in FB 74 onwards .To preserve confidentiality, landings were often summarized in an "All Other Ports" category.Complete time series are available for 16 ports (from north to south): Crescent City, Eureka, Fort Bragg, Bodega Bay, San Francisco, Santa Cruz, Moss Landing, Monterey, Morro Bay, Avila/Port San Luis, Santa Barbara, Long Beach, Terminal Island, San Pedro, Newport Beach, and San Diego (Fig. S2).Near complete time series are available for an additional 6 ports (from north to south): Trinidad, Oakland, Port Hueneme, Santa Monica, Redondo Beach, and Wilmington (Fig. S2).FB 173 and 181 published species-specific totals by port complex (but not by port) and we used this data to visually fill missing years (Fig. 6).and overall  The annual number of licensed commercial fishers participating in California's fisheries was reported for 1916-1999 in FB 49-181 and for 2000-2020 on the CDFW licensing statistics website (CDFW, 2021).The number of commercial fishers by area of residence was additionally reported for 1935-1976 in FB 49-170 (Fig. 7).Information on the nationality and nativity of licensed commercial fishers was reported for 1935-1950 but were not digitized due to sensitivities in using this data.The totals are summarized by license year, which extends from April 1 to March 31 of the following year.For example, totals for 1952-53 (which we represent as 1952 for simplicity), represent totals from April 1, 1952 to March 31, 1953.In general, the area of residence includes seven regions in California (Eureka, Sacramento, San Francisco, Monterey, Santa Barbara, Los Angeles, San Diego), two regions outside the state (OR/ Fig. 5. Annual commercial fisheries landings by broad taxonomic group from 1936 to 2019 based on the "by source" dataset.The "other" category encompasses algae, plants, turtles, and frogs.The "other invertebrate" category includes sea urchins, sea cucumbers, sea stars, and jellyfish, among many others.Data for 1949-50 and1970 were too blurry to digitize and data for 1984 were not accurately reported in FB 173.WA/AK, Mexico), and an "other" category.In some years, the Eureka region is divided into two regions (Eureka/Del Norte) and in other years the OR/WA/AK and other regions are combined.(1934-1956), by length class (1934-1976), and overall (1934-2020) The annual number of registered commercial fishing vessels, including variable information about the size and spatial distribution of the fleet, was reported in FB 44-181 and on the CDFW licensing statistics website (CDFW, 2021).Statewide totals are published for 1934-2020, statewide totals by length class are published for 1934-1976, and port complex-level totals by length class are published for 1934-1956 (Fig. 8) (Fig. 8).All three levels of information summarize totals by license year, which extends from April 1 to March 31 of the following year.For example, totals for 1952-53 (which we represent as 1952 for simplicity), represent totals from April 1, 1952to March 31, 1953.The resolution of the length class data increased in later landings reports (Fig. 8).FB 44-74 (1934-1946) used 5 length classes (15-ft bins capped at 85+ ft), FB 80-153 (1948-1969) used 6 length classes (15-ft bins capped at 100+ ft), and FB 159-170 (1970-1976) used 36 length classes (5-ft bins capped at 181+ ft).Data for 1970 (FB 153/159) were provided in both the 6-class and 36-class formats though we include only the higher-resolution data in our curated dataset.The length distribution of fishing vessels was additionally reported by port complex in FB 44-105 (1934-1956).In general, the port complexes include seven complexes in California (Eureka, Sacramento, San Francisco, Monterey, Santa Barbara, Los Angeles, San Diego), two complexes outside the state (OR/WA/AK, Mexico), and an "other registry" category.In 1934-1935, the Eureka complex was divided into Eureka and Del Norte complexes.

Annual number of commercial fishing vessels by port complex and length class
Length distribution data were not published for 1936-1938 (FB 57) (FB 57).

Annual kelp harvest by bed type, 1916-1976
The annual harvest of giant kelp (Macrocystis spp.) from open and leased beds was published in FB 161-170 (Fig. 9).The data published in FB 170 extends from 1916 to 1976 and presents annual wet weight harvest in "short" tons (i.e., 1 ton = 2000 lbs).An open bed is available to all commercial kelp harvesters.A leased bed is open only to leaseholders.No data are available from 1921 to 1930.Kelp harvest steadily increased from 1931 to 1976 with an increasing proportion of harvests coming from leased beds (Fig. 9).(1936-2019) and port complex (2000-2019) Annual landings (number of fish caught) from commercial passenger fishing vessels (CPFVs) have been published since FB 95 and span 1936 to 2019 (Fig. 10).The website-hosted landings series distinguish landings from eleven port sub-complexes (e.g., Avila Beach-Morro Bay, Princeton-Bodega Bay, Oceanside-Dana Harbor) from 2000 to 2019.We added an attribute to the data to group these sub-complexes within the port complexes defined in the 1987-2019 nine-region typology (Fig. S1) except, in this case, Eureka and Fort Bragg are combined into a single region named Eureka.The CPFV fleet was inactive during World War II (1941)(1942)(1943)(1944)(1945) due to safety restrictions and gas rationing (CDFW, 1945) and no CPFV landings are reported for this period.The data describe landings from 43 market categories including rockfish, flatfish, roundfish (cabezon, lingcod, greenling), and highly migratory species (tunas, dolphinfish, blue shark, yellowtail) (Table S2).The landings reports  (1941)(1942)(1943)(1944)(1945)(1946) in which volumes were reported at the port-rather than species-level (values were still reported at species-level).

Data limitations
There are six key limitations to the non-confidential CDFW datasets that must be considered before they can be interpreted and used correctly.First, the commercial landings datasets only describe retained catch; they do not describe catch discarded at sea.This implies that the landings data underestimate total fishing mortality, especially for fish and invertebrates with high discard rates and discard mortality.Second, the landings datasets do not always separate landings and shipments and do not always distinguish between landings sourced from California and waters outside the state.This means that users must be careful when attempting to bound studies to California fisheries only.Third, the landings datasets report landings "without regard to condition," which implies that they underestimate the round weight biomass of fish and invertebrates removed by fisheries, especially for species that are frequently processed at sea.Fourth, the datasets record landings in market categories that are not species-specific and that have evolved over time (Fig. 3).This presents challenges in accurately extracting long time series of species-specific landings.Users will need to carefully consider market categories that may encompass, but not specify, their species of interest.The PacFIN and RecFIN databases (described in detail below) use empirical catch reconstructions to disaggregate many of the generic market categories into species-specific catch beginning in 1980 and users should consider using this data for more recent years.Fifth, unlike the PacFIN and RecFIN databases, which are retrospectively updated when errors are discovered, the datasets digitized from the published CDFW landings series are not amended.Thus, it is preferable to use PacFIN or RecFIN data when possible (e.g., 1980 forward).Lastly, the preparation of non-confidential data summaries necessarily requires loss of information, and thus, for some applications, users should consider the non-confidential summaries a stepping stone before using more highly resolved data obtained through a data request.See the sections above for dataset-specific guidance on key data limitations and the section below for guidance on alternative sources of nonconfidential fisheries data that can help to overcome some of these limitations.

Other sources of non-confidential historical landings data
The National Oceanic and Atmospheric Administration (NOAA) publishes annual summaries of commercial and recreational (CPFV only) landings for every U.S. state in its Fisheries of the United States reports (e.g., (NMFS, 2020)) and makes much of this data available through the Fisheries One Stop Shop (FOSS) database (NOAA, 2021).The data extend from 1950 to present and are presented as statewide totals.The data presented in the FOSS database differ from the data presented in the CDFW landings reports in two key ways (Fig. S3): (1) the CDFW data reports landings in the weights on the original landing receipt whereas the FOSS database describes the landings of bivalve and univalve mollusks such as clams, mussels, oysters, scallops, and snails in meat weights (i.e., the shell weight is excluded); (2) the CDFW reports inconsistently publishes the harvest of kelp and the production of farmed clams, mussels, and oysters whereas the FOSS database publishes this information every year.NOAA reports that FOSS data may also differ from CDFW data because of differences in round weight conversion factors and decisions about confidentiality preservation.In our opinion, the FOSS database ranks high in its ease of use because it (1)  ; and (c) 5-ft bins capped with an 181+ ft.category (1970)(1971)(1972)(1973)(1974)(1975)(1976).The shading indicates the lower limit of each length class (i.e., 25 ft.for the 25-39 ft.length class).No length class information is available for 1936-1938(FB 57), 1977-1999(FB 181), or 2000-2020(CDFW, 2021).The size distribution of vessels is reported by port complex from 1934 to 1956 and as statewide totals from 1957 to 1976.allows the user to download all data; (2) presents the data in a tidy, longformat, machine-readable table; and (3) harmonizes species common names and provides species scientific names.
The Pacific Fisheries Information Network (PacFIN) publishes twenty highly useful, non-confidential datasets (Table S3) on commercial fisheries in its publicly accessible (i.e., no login credentials required) data portal (PSMFC, 2021).The data extend from 1980 to present and are provided at a variety of geographic scales (e.g., totals by region, state, port complex, or management area).In many cases, the landings data can be provided either as round weights (a.k.a., live weight) or in the units of the original landings (e.g., weight of fillets, heads, claws, etc.).Furthermore, the market categories provided in the PacFIN landings data are disaggregated into as highly resolved taxonomic groups as possible using state-run catch reconstruction algorithms based on port sampling data (e.g., (Ralston et al., 2010)).In our opinion, the publicly accessible PacFIN data portal, while rich in information, ranks low-tomedium in its ease of use because it (1) rarely allows the user to download all data and (2) presents the data in a wide multi-header format that is difficult to analyze without considerable data wrangling.However, it is worth noting that the data (e.g., port names, area names, common names, scientific names) are well-harmonized across datasets and years.
The Pacific Coast Recreational Fisheries Information Network (RecFIN) publishes six highly useful, detailed, and non-confidential datasets (Table S3) on recreational fisheries in its publicly accessible (i.e., no login credentials required) data portal (PSMFC, 2016).The majority of the datasets extend from 2005 to present (the salmon datasets extend from 1976 to present) and are provided at a variety of geographical scales (e.g., totals by state, district, or water area).Collectively, the datasets describe the amount of effort (e.g., number of anglers, vessels, or trips), amount of catch (e.g., number of retained fish, live discards, or dead discards), and characteristics of the catch (e.g., size, weight, sex) in California, Oregon, and Washington's recreational fisheries.These metrics are often attributed by recreational mode (i.e., party/charter, private boat, man-made structure, shore), water area (e. g., ocean offshore, ocean inshore, estuary, river, inland, etc.), or trip type (e.g., bottomfish, highly migratory, halibut, etc.).The temporal and spatial resolution of the data vary by state and dataset.The data portal is identical to the PacFIN portal in design and exhibits the same strengths and weaknesses, i.e., the data are well-harmonized but are provided in wide-format and are difficult to download in large quantities.
The California Cooperative Groundfish Survey (CCGS) makes a wealth of commercial fisheries landings, age and length composition, and management regulation data publicly available through its CAL-COM database (CCGS, 2019).Importantly, since 1978, the CCGS has sampled commercial landings to identify the species that comprise each market category and to estimate the volume of species-specific catch where possible (Pearson and Erwin, 1997;Ralston et al., 2010;Sen, 1986Sen, , 1984)).Briefly, the protocol involves sampling a subset of landings from a subset of vessels and applying the species-specific composition of the sampled landings to all of the landings within strata defined by combinations of market category, location (port complex), time (year and quarter), fishing technique (gear type), and landing condition (alive/dead).It is this disaggregated landings data that is curated in the CALCOM database and subsequently submitted to the PacFIN and then the NOAA FOSS databases.
Lastly, a team of scientists from NOAA's Southwest Fisheries Science Center (SWFSC) digitized much of the Fish Bulletin data describing monthly landings by port complex and species (Fig. S5) and made it publicly available on the NOAA CoastWatch data server (Mason, 2004;Norton, 2015).This represents an enormous effort and a highly valuable product, but we caution that this effort did not represent a literal transcription of the original data.The digitization effort for 1928-2002 performed by (Mason, 2004): (1) only digitized landings from California waters (i.e., it excluded information on landings from waters outside the state and information on shipments) and ( 2) summarized data into a six region system that excludes the Sacramento region and aggregates data reported at finer resolutions (Fig. S1).A similar digitization effort for 2003-2014 performed by (Norton, 2015, p. 20) does not currently include landings from the Eureka region.

Conclusions
Increasing the accessibility of non-confidential historical fisheries data to researchers, fishers, and other interested stakeholders is critical to preparing fisheries for environmental, economic, and regulatory shocks.In this paper, we seek to increase the accessibility and utility of California's rich but hitherto obscured non-confidential historical fisheries landings and participation data.We hope that this effort will provide a useful template for future efforts to curate and share historical fisheries data, and we encourage such endeavors (whether state, federal, NGO, or university-led) to adhere to the following six standards when publishing open-access data: (1) provide data in a machine-readable format (e.g., CSV, XLS, TXT; not as a PDF) to ease access; (2) format data as a long-format rectangle (i.e., fully propagated rows and columns) to ease analysis; (3) provide "download all" functionality to ease access to meaningful quantities of data; (4) harmonize categorical data (e.g., port names, common names, scientific names) across years and datasets to ease analysis; (5) provide scientific names to eliminate taxonomic ambiguity; and (6) provide detailed meta-data to describe data contents and limitations.We believe that efforts to publicly provide complete, Fig. 10.Landings from Commercial Passenger Fishing Vessels (CPFVs) by (A) species group and (B) port complex.In (A), roundfish include lingcod, cabezon, and kelp greenling; highly migratory species include tunas, dolphinfish, blue shark, and yellowtail; and coastal pelagic species include mackerel, barracuda, and bonito (see Table S2 for more details).In (B), the Eureka complex encompasses both the Eureka and Fort Bragg complexes (see Fig. S1 for details) and the Los Angeles port complex encompasses four sub-complexes.CPFVs were not active from 1942 to 1944 due to gas rationing and safety concerns during World War II.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)well-formatted, and well-documented historical fisheries data would dramatically enhance efficiency.First, open-access data would lessen burdens on agency scientists by reducing the number of data requests and by making the remaining data requests more targeted.Second, wellformatted open-access data would increase the efficiency of non-agency scientists by eliminating delays associated with requesting and cleaning non-confidential data or by allowing them to anticipate the stories contained in the data before confidential data are even shared.Overall, increasing efficiency in data provisioning would increase our collective efficiency at finding tractable solutions to pressing fisheries challenges.(1942)(1943)(1944) due to gas rationing and safety concerns, some vessels maintained CPFV fishing licenses during this period.The number of CPFV-licensed vessels was not reported for 1942 (FB 59).In (B), the horizontal lines indicate the years for which angler effort was reported in either days or hours of fishing effort.

Fig. 1 .
Fig. 1.The temporal coverage of the non-confidential fisheries landings and participation datasets curated for the CALFISH database.Data before 2000 (vertical dashed line) were curated from the Fish Bulletin-hosted landings reports and data after 2000 were curated from the website-hosted landings reports.All data represent annual totals.Note: the number of CPFV vessels, anglers, and landed fish was reported by port complex after 2000.

Fig. 4 .
Fig. 4. Annual commercial fisheries landings and shipments by source from 1936 to 2019.Data for 1949-50 and1970 were too blurry to digitize and data for were not accurately reported in FB 173.Data were not reported by source in FB 173 or 181.Shipments were only reported in FB57-173 (1936-1986)   and were only reported for tuna after 1965.

Fig. 6 .
Fig. 6.Annual commercial fisheries landings by port complex from 1941 to 2019 based on the 1936-1986 seven-region typology (Fig. S1).Annual commercial landings by port and species are available for 1941-1976 and 2000-2019 as indicated by the horizontal lines.The solid portion of the lines indicate years in which both species-level volumes and values were reported.The dotted portion of the lines indicate years(1941)(1942)(1943)(1944)(1945)(1946) in which volumes were reported at the port-rather than species-level (values were still reported at species-level).

Fig. 7 .
Fig. 7.The number and proportion of licensed commercial fishers by area of residence from 1916 to 2020.Area of residence data was published for 1935 (partially; FB 49) and 1939-1976 (fully; FB 57-170).Gray bars in other years indicate statewide totals.

Fig. 9 .
Fig. 9. Annual kelp harvest by bed type from 1916 to 1976.No data are available from 1921 to 1930.Data for 1916-1976 are from FB 170, data for 1977-2003 are from the NOAA FOSS database, and data for 2004-2019 were estimated from a figure on the CDFW website using data extraction software.A short ton is equal to 2000 lbs.

Fig. 11 .
Fig. 11.Effort and participation in the Commercial Passenger Fishing Vessels (CPFV) fishery in terms of the (A) number of CPFV-licensed vessels and (B) number of anglers participating on CPFV trips.The Eureka complex encompasses both the Eureka and Fort Bragg complexes (see Fig.S1for details) and the Los Angeles port complex encompasses four sub-complexes.Although CPFVs were not active during World War II(1942)(1943)(1944) due to gas rationing and safety concerns, some vessels maintained CPFV fishing licenses during this period.The number of CPFV-licensed vessels was not reported for 1942 (FB 59).In (B), the horizontal lines indicate the years for which angler effort was reported in either days or hours of fishing effort.

Table 1
Sources of public non-confidential California fisheries landings and participation data.
C.M.Free et al.