GLOBE Observer Data: 2016–2019

Abstract This technical report summarizes the GLOBE Observer data set from 1 April 2016 to 1 December 2019. GLOBE Observer is an ongoing NASA‐sponsored international citizen science project that is part of the larger Global Learning and Observations to Benefit the Environment (GLOBE) Program, which has been in operation since 1995. GLOBE Observer has the greatest number of participants and geographic coverage of the citizen science projects in the Earth Science Division at NASA. Participants use the GLOBE Observer mobile app (launched in 2016) to collect atmospheric, hydrologic, and terrestrial observations. The app connects participants to satellite observations from Aqua, Terra, CALIPSO, GOES, Himawari, and Meteosat. Thirty‐eight thousand participants have contributed 320,000 observations worldwide, including 1,000,000 georeferenced photographs. It would take an individual more than 13 years to replicate this effort. The GLOBE Observer app has substantially increased the spatial extent and sampling density of GLOBE measurements and more than doubled the number of measurements collected through the GLOBE Program. GLOBE Observer data are publicly available (at observer.globe.gov).


Introduction
The Global Learning and Observations to Benefit the Environment (GLOBE) Program is an international science and education program that launched in 1995 (globe.gov) (Berglund, 1999;Finarelli, 1998;Means, 1998;Muller et al., 2015;Nugent, 2018;Rock et al., 1997). GLOBE Observer is a NASA-funded citizen science project that is part of the GLOBE Program (observer.globe.gov). GLOBE Observer was founded to help fill data gaps in the GLOBE database and to expand the educational benefits of the GLOBE Program to broader audiences. A 2015 focus group assessment of GLOBE found that while data collection protocols were rigorous, the resulting environmental data were not as widely used in scientific publications as they could be because of widespread temporal and spatial data gaps. Data were primarily collected at schools, during the school day and school year. By accepting environmental observations from any interested volunteer, GLOBE could fill gaps with data collected outside of school grounds and school hours and gather enough data to achieve meaningful geospatial density. GLOBE Observer supports volunteer data collection through a mobile application ("app"), website, training materials, help desk, and other support materials. GLOBE Observer's primary objectives are to (1) increase GLOBE's spatial and temporal data density; (2) enable an increase of scientific and student research; (3) help volunteer users of all ages be part of the GLOBE community; and (4) increase scientific literacy among all participants.
The GLOBE Program offers more than 50 data collection protocols, while GLOBE Observer includes four: clouds, mosquito habitats, land cover, and tree height. These protocols were selected because there was a demand for more of that kind of data from specific science communities and they can be done well by minimally trained volunteers who have no or simple data collection equipment. For each protocol, volunteers are collecting data that would be prohibitively expensive for a science community to collect because of the desired global distribution and repeat timing. For clouds, land cover, and mosquito habitats, data consist of an observation of conditions (cloud cover, mosquito breeding habitat, etc.) along with supporting photographs stamped with location and time that can be used to verify observations. Tree data provide an estimate of tree height with a georeferenced photograph of the tree.
The NASA GLOBE Observer mobile app was launched in 2016 and was created to broaden the opportunities for the general public to contribute to GLOBE as citizen scientists and increase the spatiotemporal density of observations. Through the GLOBE Observer app, participants in GLOBE countries (globe.gov/globe-community/community-map) can contribute ground-based atmospheric, terrestrial, and hydrologic observations complementing NASA's suite of airborne and spaceborne observing platforms. This paper addresses GLOBE Observer's contribution to data density as a foundation for student and science research and serves as a definition of the data collected during the initial years of the project. Previous publications have analyzed subsets of GLOBE Observer data associated with specific sampling events (Aïkpon et al., 2019;Dodson et al., 2019;Rahman et al., 2019). This paper presents the first summary and analysis of the entire 2016-2019 GLOBE Observer citizen science data set and discusses areas for future improvement.

GLOBE Observer Mobile App
The GLOBE Observer (GO) app is a NASA-funded citizen science app available free-of-cost in the Apple App Store and Google Play Store (observer.globe.gov/get-the-app). The GO app was designed as a tool for the general public (all ages 13+). The app is written in JavaScript, HTML, and CSS, utilizes the AngularJS (2015) framework, and is built using Apache Cordova (2019) for deployment to multiple platforms (i.e., iOS and Android). GO observations are publicly available (at observer.globe.gov). GLOBE has built free on-line tools to visualize the data on a world map (vis.globe.gov), retrieve data as a comma separated value (csv) file (datasearch.globe.gov), and an application programming interface (API) to facilitate automated or command line data queries (api.globe.gov/search/). Figure 1 shows a flowchart of how the app works and the components of an observation. The GO app currently accepts observations of clouds, mosquito breeding habitats and mosquito type, land cover, and tree heights. There is also a temporary tool for solar eclipses, which was activated in the app for a limited time for the 2017 North American solar eclipse  and again for the 2019 South American solar eclipse (GLOBE Observer, 2019). Participants complete required, interactive training in the app to learn how to use the GO tools, after which prompts guide the participant through data collection and submission. For each observation, the internal clock and GPS automatically records date, time, latitude, and longitude. Participants answer a series of six yes/no questions about surface conditions that affect satellite retrievals (e.g., presence of snow or ice on the ground) and then photograph and classify what they see (e.g., cloud type). Numerical estimates, such as number of contrails or number of mosquito larvae, are self-reported by the participant and manually entered in app. Additional details about the tools in the app are in GLOBE (2019a), and step-by-step instructional videos are publicly available (at observer.globe.gov/doglobe-observer).
To improve data usability for researchers and to boost science literacy among volunteers, the protocols are aligned with NASA satellite data where feasible. For land cover, participants are shown the land cover classification at their location for the MODIS 250-m global land cover product (NASA, 2020) and asked to comment on how their observation compares. The clouds protocol is directly linked to daily satellite overpasses. Participants can opt-in to receive satellite flyover notifications on their device when Aqua (Parkinson, 2003), Terra (Lee et al., 2000), and CALIPSO (Winker et al., 2003) are flying overhead. As encouragement, participants who make a cloud observation within 15 min of a satellite overpass are sent a personalized email containing their cloud observation alongside the satellite observation. The satellite matching is performed by a team at NASA Langley Research Center and also includes matching to geostationary satellites: GOES, Meteosat, and Himawari (Colón . The matching of satellite data to cloud ground observations by citizen scientist started in 1996 with the CERES S'COOL (Student's Cloud Observations On-Line) Project that compared student cloud type and cover observations with NASA's CERES data (Cloud and the Earth's Radiative Energy System) (Chambers et al., 2017). Once the S'COOL project migrated to be NASA GLOBE Clouds, the team continued to do satellite matches to the CERES instrument. CALIPSO, another project led at NASA Langley Research Center, was recently added. A 5-year plan is in development to add other satellites and to match satellite data to other GO protocols. Between 1 January 2017 and 1 December 2019, 203,000 GO cloud observations have been matched to satellite observations. GLOBE clouds data with satellite matches for 2017-2019 are publicly available (observer.globe.gov/get-data/clouds-data).

GLOBE Observer Data: 2016-2019
Figures 2 and 3 show the geographic coverage of observations made using the app. Thirty-eight thousand users have submitted 320,000 observations from all seven continents. Assuming (1) it takes a participant 5 min to go outside, find a suitable sampling location, and complete an observation with the GLOBE Observer app and (2) an average work year has 2,000 hr, it would take an individual person 13 work years and 5 months to collect the same number of observations. Assuming a GS-12 step 1 pay grade for fiscal year 2019 (opm.gov), this would be equivalent to more than $1 million (USD) in salary and does not even include the cost of sampling equipment or the extensive travel it would require to visit the same locations.
The data density is beginning to be great enough to support an increase in scientific and student research, as was one of the original project objectives. From 2017 to 2019, over 500 U.S. student and youth research projects were documented using GLOBE Observer data, which represented an increase from 31 student projects in 2016 (Schwerin & Clark, 2019). The number of scientific and educational publications, articles, and presentations using GLOBE Observer data has also been steadily increasing since 2017 (observer.globe.gov/ publications). Peer-reviewed research applications of GLOBE Observer data include analysis of the 2017 North American solar eclipse , analysis of global cloud cover observations collected during a month-long international data collection blitz (Colón , and mapping the distribution of Aedes mosquitoes in west Africa (Aïkpon et al., 2019). Figure 2 contrasts the spatial distribution of observations submitted through the GLOBE Observer app and through the traditional GLOBE channels. The app has increased the geographic coverage of GLOBE observations. Some geographic gaps in the coverage exist and will persist in countries which are not participating GLOBE countries. Similar to other international citizen science projects, North America and Europe are the most intensely observed regions (e.g., iNaturalist; Chandler et al., 2017). Notable increases in geographic coverage enabled by the app include India, West Africa, South America, and Australia and are discussed below. Figure 3 shows locations of observations from each app tool individually. The geographic variability in data coverage among the different app tools is multi-factorial. Each tool was introduced to the app at a different time (Table 1), which accounts for some of the variability. Geographic patterns observed in reported data also reflect the demographics and level of engagement of the GLOBE Program in participating countries prior to a tool's launch in the app. For example, GLOBE schools submitted 87,000 cloud observations in the 12 months prior to the launch of the cloud tool in the app, versus only 1,100 land cover observations submitted in the 12 months prior to the launch of land cover in the app. The difference in activity level seen in traditional GLOBE persists with the app; 35× more cloud observations than land cover observations have been submitted through the app. Another factor driving app activity is internal data collection campaigns, such as the 3-year (2018-2021) Trees Around the GLOBE campaign (GLOBE, 2020). To illustrate this point, the tree height tool in the app has been available for half the time of the land cover tool, but tree heights observations are submitted through the app at twice the rate (1,400 tree height observations per month vs. 650 land cover observations per month, on average). GLOBE also encourages participants to investigate science questions that are meaningful locally. As a result, places where mosquito-borne diseases are endemic have adopted the mosquito tool more widely. In Figure 3, there are also notable examples on the map of externally driven data collection in India, in Australia by the Australian Scouts, the Arctic, and the Southern Ocean by the Polar Collective (polarcollective.org; Colón Robles et al., 2018). As expected, we see hotspots of data collection around locations where the GLOBE Program and partner organizations have conducted in-person trainings. In Figure 3, a striking example of the mark of GLOBE trainings are the hotspots of mosquito observations in Thailand (GLOBE, 2019b), Benin (Aïkpon et al., 2019), Peru, and Brazil (mosquito.strategies.org), where extensive trainings took place as a result of initiatives funded by the U.S. Department of State and USAID (Low et al., 2019). GLOBE has also conducted numerous cloud trainings in South America.  The popularity of the cloud tool in the app compared to the other tools merits further discussion. Cloud observations are more numerous and ubiquitous in part simply because the cloud tool has been in the app longest. Other important contributing factors include the well-organized volunteer base and outreach materials GLOBE inherited from the NASA S'COOL project, which seeded a community of GLOBE teachers collecting clouds data before the app launched (Chambers et al., 2003(Chambers et al., , 2017. Due to the resource and staffing demands required to maintain satellite matching in full operation, the cloud tool is currently the only tool within the app that participants can opt-in to receive notifications of NASA satellite flyovers to their phone and have their satellite matches emailed to them (Colón .  Figure 5 shows the diurnal distribution of submissions coming through the GLOBE Observer app. Cloud observation submissions dominate. The peak around 11:00 UTC is driven by participation in Europe. The peak around 18:00 UTC is driven by North America and coincides with (1) timing of GLOBE Observer social media posts and (2) Aqua flyovers on the East Coast of the United States. In contrast, traditional data submissions from GLOBE peak between 9:00 and 11:00 UTC. Prior to 2016 and the app, GLOBE suggested cloud observations be done at local solar noon, and many traditional GLOBE schools continue this today. The unimodal distribution for traditional GLOBE data submissions is skewed toward the school day in the Middle East. The Saudi Kingdom does not currently support use of the GLOBE Observer app but very actively submits cloud observations through traditional GLOBE channels and is consistently one of the top-contributing countries in the region. Photographs, while qualitative, are a valuable asset to the GLOBE database because they provide visual evidence of reported phenomena (e.g., landslides, haboobs, and Aedes aegypti mosquito larvae) that can be independently verified by data end users. Here the considerable benefit the GLOBE Observer app brings to the GLOBE Program is the ubiquity of built-in cameras in modern smartphones. Table 1 summarizes the number of photo submissions. Participants have submitted 10× more photos through the GLOBE Observer app than through traditional GLOBE channels over the same time period. Table 1 summarizes the percentage of participants who submit photos. Cloud and land cover are used for comparison here because they both include the same six directional photos (north, south, east, west, up, and down). The app uses the internal compass and gyroscope on a participant's device and automatically takes the photo when it detects the device is in the correct orientation and direction. This standardizes photos across observations. Taking photographs is optional but strongly encouraged. Missing photos does not invalidate an observation, but the inclusion of photos can increase the observation's trustworthiness and information content (e.g., land cover photos might confirm the reported presence of impervious surfaces and provide contextual clues that an area is undergoing urbanization). The app's design has evolved to encourage participants to take photos with their observation. We see evidence that the intentional design change is working when we compare the older cloud tool to the newer land cover tool. Participants submit land cover photos in all six directions at twice the rate of clouds. Design modifications to the app, such as pop-up reminders or reward messages, could potentially close this gap. The set of flags HC, OD, OP, OR, or OX (see Table 2) checks for logical consistency around reports of haze or other obscurations in the sky. These results are consistent with anecdotal participant feedback that it is confusing in the app how to correctly report the presence of haze, smoke, and dust. The flag set MR and NR checks user-reported mosquito larvae and contrail counts for unexpectedly large values (e.g., mosquito larvae count = 1,000,000). ER checks that the reported elevation is between 6,000 and −300 m (GLOBE, 2019a). Observations flagged ER were measured over the ocean where a negative elevation is returned reflecting the ocean bathymetry. Data consumers are advised to use caution with such flagged data in statistical analyses. For both sets of flags, modest revisions of the app's design could potentially remedy the data quality issue.

Data Quality and Limitations
The most commonly triggered flag, "LW," checks if an observation might be reported over water and is intended to alert data consumers to potentially erroneous locations. LW flags account for more than half of the total flag occurrences in the GLOBE Observer data set. Cloud observations are valid if taken on land or water (e.g., aboard a ship). Land cover, tree height, and mosquito habitat observations should only be  Table 2. Here we use Cartopy's 50-m resolution earth geometry as a land/ocean mask for the LW flag.
collected on land. Here we use Cartopy's 50-m resolution earth geometry for our land/ocean mask. Most of the observations flagged LW are valid upon closer inspection; they are taken on land within 50 m of a coastline or are cloud observations reported aboard a ship. A small number (<0.05%) of tree, land cover, and mosquito observations report a location over the open ocean and appear truly erroneous. This may be due to location "spoofing" by a participant to obscure their location (Zhao & Sui, 2017), or reported performance issues with the app's map function when a participant's device is out of cellular or Wi-Fi range and cannot retrieve reliable GPS coordinates.
A limitation of GO data not captured in Figure 6 is unclassified observations. Classifying cloud type, land cover type, or mosquito genus is optional in the app. Cloud type is classified only 68% of the time, land cover type 43%, and mosquito genus only 14%. Unclassified images are still valuable because they provide a durable visual record of what a participant reports that an end user can check. However, lack of classification limits discoverability by automated keyword searches. A small pilot project is being conducted internally to explore the use of crowdsourcing to label unclassified GLOBE Observer photos to overcome this limitation.

Lessons Learned
Several lessons have been learned about participant photos. Outreach messaging coming from GLOBE Observer has predominantly aimed at encouraging participants to complete the classification steps in the app (i.e., classify cloud type, land cover type, or mosquito type). However, it is evident in the available data that a majority of participants will take photos and a much smaller percentage will complete the in-app classification. End users of the data have expressed a strong preference for observations that include photos. A mechanism for scientists to ask participants to take photos at a particular time, location, phone orientation, or of a particular phenomenon is highly requested. Taken together, this suggests outreach messaging and in-app user experience should pivot and optimize for targeted data collection photo-taking. Photo labeling could be crowdsourced on a platform like Zooniverse to assemble a training data set for AI-assisted image classification (e.g., Fortson et al., 2018;Willi et al., 2019). AI-assisted image classification could facilitate rapid in-app feedback to participants and would increase the information content available for research. Early work using Amazon's Rekognition™ AI software is in progress by the GLOBE Data Information Systems (DIS) team to detect and blur out human faces and text such as automobile license plates.
Cloud satellite matching is popular with participants and is being leveraged in research to a greater extent than any data product (Ault et al., 2006;Chambers et al., 2017;Dodson et al., 2019). Satellite matching is performed by a team at NASA Langley Research Center with the goal of combining ground-based and spaceborne perspectives to increase the amount of information about a single cloud scene. The analysis here adds to the growing body of evidence supporting the value of satellite matching. Since 2017, 203,000 observations have been matched to Aqua, Terra, CALIPSO, GOES, Himawari, and Meteosat satellite retrievals. We find here that the GLOBE Observer data set contains 2 orders of magnitude more Earth and Space Science cloud observations than any other type of observation. We also find satellite overpasses contribute to peak submission times over the course of a day. This suggests that the expansion of satellite matching to the mosquito, land cover, and tree tools in the app could be a worthwhile investment for the GLOBE Observer project team. The satellite matching component has been an effective way to engage citizen scientists and provides co-located, independent data that can be leveraged in research.

Conclusions
GLOBE Observer is a NASA-sponsored international citizen science project (observer.globe.gov) that is part of the GLOBE Program founded in 1995 (globe.gov). This article presents the first summary of the GLOBE Observer data set. GLOBE Observer launched a mobile app in 2016 for Android and iPhone devices that anyone in a participating GLOBE country can use to make observations (including photographs) of cloud cover and cloud type, mosquito breeding sites and mosquito type, land cover type, and tree height.
Between 1 April 2016 and 1 December 2019, 38,000 participants have submitted 290,000 cloud, 19,000 mosquito, 8,400 land cover, and 9,500 tree observations spanning all seven continents. This represents as 1.91fold increase in data volume in the GLOBE Program's database for these four protocols over the period of April 2016 to December 2019 and a substantial expansion in geographic coverage. The majority of observations are submitted from Europe and North America between the hours of 11:00-18:00 UTC. About half as many GLOBE Observer observations (7.8%) as traditional GLOBE observations (13%) are flagged for quality; in both cases, the most likely reason for an observation to be flagged is for the location potentially being over water. GLOBE Observer data are made publicly available for everyone (observer.globe.gov) and offer a novel ground-based data set to augment spaceborne, airborne, and in situ Earth observations. Analysis of the data here suggests the satellite matching for clouds is a notably successful feature. The data suggest expansion of satellite matching to the land cover, mosquitoes, and tree app tools; and optimization for targeted photo-taking could be productive avenues of development.