Crowdsourcing Geospatial Data for Earth and Human Observations: A Review

The transformation from authoritative to user-generated data landscapes has garnered considerable attention, notably with the proliferation of crowdsourced geospatial data. Facilitated by advancements in digital technology and high-speed communication, this paradigm shift has democratized data collection, obliterating traditional barriers between data producers and users. While previous literature has compartmentalized this subject into distinct platforms and application domains, this review offers a holistic examination of crowdsourced geospatial data. Employing a narrative review approach due to the interdisciplinary nature of the topic, we investigate both human and Earth observations through crowdsourced initiatives. This review categorizes the diverse applications of these data and rigorously examines specific platforms and paradigms pertinent to data collection. Furthermore, it addresses salient challenges, encompassing data quality, inherent biases, and ethical dimensions. We contend that this thorough analysis will serve as an invaluable scholarly resource, encapsulating the current state-of-the-art in crowdsourced geospatial data, and offering strategic directions for future interdisciplinary research and applications across various sectors.


Introduction
Over the past several decades, human and Earth observations have been overwhelmingly dictated by traditional, authoritative data sources, such as population censuses, surveys, satellite imagery, and other physical sensors.However, the landscape of data creation and analysis has undergone a seismic shift in recent times, fueled primarily by the advent of revolutionary paradigms such as Web 2.0 [1] and Big Data [2].This paradigm shift was precipitated by several key factors, including widespread internet access, the ubiquity of smartphones, and a general surge in participatory culture [3].The impact of this transition has been profound across various industries.In sectors like urban planning, transportation, and environmental monitoring, usergenerated data have provided unprecedented real-time insights and community-driven perspectives, often leading to more responsive and adaptive decision-making processes [4].In the commercial sector, businesses harness user-generated content for enhanced market research, customer engagement, and trend analysis, leading to more customer-centric product development and marketing strategies.The importance of this shift lies in its empowerment of ordinary individuals to contribute to and influence fields traditionally dominated by experts and authorities.This democratization has not only diversified the types of data available but also led to richer, more multifaceted insights into human behavior and environmental changes.
It is crucial to recognize that this democratization and the ensuing influx of user-generated content have crucially linked human experiences with environmental monitoring and analysis.For instance, the real-time tracking of human mobility patterns using smartphone data can greatly enhance our understanding of urban dynamics, traffic management, and even disaster response, effectively bridging the gap between human behaviors and environmental impacts.Similarly, public engagement in reporting environmental changes, like air quality or weather conditions, through mobile applications or social media platforms, brings a unique and valuable human dimension to Earth observations.These interconnected contributions emphasize a vital, yet previously underexplored synergy between human and Earth data sources, illustrating a more cohesive narrative of how human activities and Earth systems are inextricably linked.
The term "crowdsourcing (crowdsourced) geospatial data, " which is used extensively throughout this paper, encapsulates the data acquisition process undertaken by large, diverse groups of individuals who often lack professional training [3].The term "NeoGeography, " introduced by Turner [5], conveys a broader contextual understanding through the sharing of location data, enabled by an ever-expanding array of freely accessible tools.In the same vein, "Volunteered Geographic Information (VGI)" [6] signifies the rising trend of ordinary citizens playing an active role in the creation of geographic information.VGI is characterized as "the employment of tools to create, assemble, and distribute geographic data provided voluntarily by individuals." Another pertinent term, "Citizen Science, " refers to the active participation of the public in scientific research, monitoring, and action research, which often culminates in scientific progress and a broader public understanding of scientific principles [7,8].Despite the varying focuses of these definitions, they all emphasize the growing importance and impact of nonauthoritative data sources.The simultaneous advancement of rapid, accurate positioning technology, the prevalence of digital devices, the accessibility of high-speed communication links, and the progression in data management techniques have expedited the conceptual, methodological, and practical evolution of crowdsourced geospatial data.Contrasting with the traditional human and Earth observation methods, which are primarily coordinated by governmental and large institutional entities, the data collection process has been increasingly democratized, incorporating the participation of everyday users.This development effectively diminishes the erstwhile barrier between data producers and users.Such an innovative, decentralized approach to data collection, bolstered by an extensive global user base, potentially facilitates high-resolution spatiotemporal observations that were previously unattainable.
Crowdsourcing geospatial data has been analyzed through a multitude of lenses in recent studies.Numerous scholarly reviews have adopted a categorical approach to structure their analyses, centering on data source types.These include various platforms such as social media [9,10], OpenStreetMap (OSM) [11], and an array of other participatory datasets [12,13].On the other hand, some reviews have taken a domain-focused approach, examining the practical applications of crowdsourcing geospatial data in fields like disaster mitigation [14], public health [15], remote sensing [16], and urban sciences [17].While these studies offer valuable insights, a common limitation is the absence of a comprehensive, overarching perspective that ties together the various data sources and application domains.The primary motivation behind this review is to bridge this existing gap by offering a holistic perspective of crowdsourcing geospatial data.This will aid in fostering an improved interdisciplinary dialogue, supporting the development of innovative strategies and facilitating more efficient utilization of crowdsourced geospatial data across different sectors.
In addressing the interdisciplinary nature of this topic, we opted for a narrative review over traditional systematic or metaanalytical methods.Systematic reviews and meta-analyses, reliant on keyword-based database queries and quantitative data aggregation, often fail to adequately capture the nuanced and theoretical aspects crucial for such a multifaceted topic, necessitating substantial post-selection refinement [10].Our narrative approach, emphasizing seminal works identified by subject-matter experts, offers a more targeted and insightful exploration.This method ensures comprehensive coverage and a contextual understanding, circumventing the limitations of keyword-dependent searches and the quantitative constraints of meta-analyses in addressing the complex dimensions of our interdisciplinary study.The articles assessed in this undertaking were intentionally chosen by the authors, who possess substantial expertise in crowdsourcing studies and have conducted interdisciplinary inquiries using crowdsourcing data.This targeted selection process ensures that the review encapsulates diverse perspectives and a rich array of experiences in this complex and evolving field.
In this study, we conduct an exhaustive analysis of the current efforts, possibilities, and obstacles associated with crowdsourced geospatial data across two fundamental perspectives: human observations ("Crowdsourcing Earth Observations" section) and Earth observations ("Crowdsourcing Human Observations" section).We group the applications of crowdsourcing geospatial data into varying domains, dissect the traits of data and contributors for widely recognized crowdsourced geospatial platforms, and investigate their data collection paradigms and applicable potential in detail.Furthermore, we discuss the intrinsic challenges ("Challenges in Crowdsourcing Earth and Human Observations" section) connected to crowdsourced geospatial data, considering facets such as data quality and accuracy, data bias, privacy concerns, legal and ethical dimensions, the sustainability of data collection, training and validation requirements, and issues surrounding data interpretation.This section is followed by a forward-looking discussion on prospective directions and pathways ("Future Directions and Pathways" section).The organizational layout of this review is illustrated in Fig. 1.
We believe that this comprehensive review will serve as an invaluable touchstone, encapsulating the concerted efforts in human and Earth observations utilizing crowdsourced geospatial data and providing future direction for effectively utilizing information gathered from crowdsourcing platforms to address extant and future challenges.

Crowdsourcing Earth Observations
In the era of rapidly evolving technology and increasing environmental concerns, crowdsourcing Earth observations has Downloaded from https://spj.science.orgon January 22, 2024

Weather and climate observations
The progression of technology has significantly advanced crowdsourcing methods for geospatial data collection for weather and climate observations.These methodologies can be effectively segmented into four principal categories, i.e., citizen science, social media, in situ sensors, and smart devices, with each offering unique benefits, facing specific challenges, and bringing distinct values.
Citizen Science stands out as a transformative force, especially in the domains of historical data retrieval and ongoing environmental monitoring.Endeavors such as Old Weather [18] and Cyclone Centre [19] highlight the tremendous capability of leveraging the general populace in data transcription and categorization.The Global Learning and Observations to Benefit the Environment Programme (GLOBE) further integrates students and educators in environmental measurements that adhere to stringent scientific standards.Moreover, ventures such as CoCoRaHS [20] and We Sense It (http://www.wesenseit.com/web/guest/home) underscore the significant contributions of community-driven networks, ensuring superior data collection quality.These data profoundly influence areas like climate change assessments [21], meteorological forecasting [22], and advancing knowledge about extreme weather events [23].
Social media platforms including Twitter and several mobile apps are progressively utilized as real-time data repositories.Initiatives like the UK snow map (https://uksnowmap.com/)and Twitcident (http://twitcident.com/)[24] emphasize the importance of user-generated content in monitoring events, from snow patterns to storm occurrences.Moreover, applications such as Metwit (https://metwit.com/)and Weddar (http:// www.weddar.com/)procure localized meteorological data, forming a nexus between personal experiences and analytical data.Yet, the intrinsic attributes of social media, which can occasionally propagate misinformation, demand the adoption of meticulous filtering mechanisms.
In situ sensors have become pivotal in the data collection landscape.The incorporation of internet-capable, cost-effective sensors, integrated into individual weather stations or larger Downloaded from https://spj.science.orgon January 22, 2024 networks, has magnified both the quantity and detail of data accrued.Networks like Air Quality Egg (https://airqualityegg. com/home), a community-led air quality-sensing network, and Weather Underground (https://www.wunderground.com/)expedite real-time data acquisition from diverse origins.Although these devices are cost-effective, they necessitate rigorous calibration to validate data precision.
Smart devices are increasingly designed to connect with a variety of sensors, such as the BlutolTemp Thermometer [25], iCelsius thermistor (https://www.icelsius.com/),and iSPEX aerosol measuring sensor (www.ispex.nl),greatly facilitating dense data acquisition, predominantly in metropolitan settings.Notable initiatives, like the N-Smarts pollution project [26], leverage these sensor-equipped smartphones to understand urban air pollution's effects on individuals and communities.This plethora of sensors facilitates a crowdsourcing approach termed "human-in-the-loop sensing, " enabling the collection and analysis of real-world data.Apps like OpenSignal (https:// www.opensignal.com/apps)and PressureNet (http://pressurenet. cumulonimbus.ca)harness smartphone sensors to collect realtime weather data.However, there are challenges in using such data, primarily due to the potential variations in local weather conditions, which raises concerns about data accuracy and consistency.
In conclusion, the convergence of technological innovation and community collaboration has significantly transformed the realm of geospatial data collection, especially in weather and climate studies.Citizen science initiatives democratize the scientific process and ensure a consistent influx of essential data.Although social media introduces certain complexities, it provides unparalleled real-time insights.In situ sensors and smart devices enhance the precision and depth of data collection, facilitating more sophisticated interpretations.When integrated with rigorous validation and calibration methodologies, these tools are poised to drive future breakthroughs in environmental and atmospheric research.

Biodiversity
A plethora of crowdsourcing geospatial data has been accumulated through biodiversity citizen science projects for documenting and monitoring plants, animals, and other species on Earth [27].Biodiversity citizen science projects provide infrastructure and platforms (e.g., website and app) to communities of volunteers (e.g., nature observers) who are interested in any aspects of biodiversity to contribute species observations [28,29].A variety of biodiversity citizen science projects are in operation, attracting millions of contributors to report species observations.These projects may differ in species specialization and/or geographic scope, but together, they provide valuable data on species distribution, helping scientists monitor biodiversity.Three representative biodiversity citizen science projects are introduced here for illustration.
iNaturalist (https://www.inaturalist.org/),launched in 2008, enables anyone to share species observations of all taxa around the world by uploading species photos [29,30].Each species observation is referenced with a geographic location and time (e.g., extracted from photos), and a species identification vetted by the community.Information of the observer and identifier is also retained.iNaturalist is arguably the largest biodiversity citizen science project in the world, having compiled over 148.5 million observations on more than 431,900 species based on contributions from over 2.7 million observers and 317,000 identifiers.eBird (https://ebird.org/) is for birdwatchers around the world to share bird sightings by submitting their birding checklists [28].Essential information in a checklist includes location and time of the birding event, list of bird species observed, bird activity (e.g., breeding and behavior codes), bird group size, and information of the observer.A total of 10,715 bird species have been reported to eBird based on 82.Datasets resulted from biodiversity citizen science projects contain, at the minimum, spatially and temporally referenced records of the observed species and, in some cases (e.g., eBird and iNaturalist), even information of the underlying human volunteers who carry out the observations [31].Species records can be analyzed to reveal the spatiotemporal dynamics of species distributions [32,33] to help inform conservation strategies, while information regarding data contributors allows examining volunteers' data contribution behavior patterns [34,35].It should be noted that, although crowdsourcing offers an effective means for compiling timely biodiversity data at very large scales, the volunteers behind biodiversity data production are of varied levels of expertise [36] and their observation efforts are highly variable across space, time, and observation targets [34], leading to potential biases in the data (e.g., more bird observations are made during migration seasons on common species in accessible geographic areas).Such biases must be assessed and mitigated where necessary in order to make robust inferences from the data [37,38,39].

Air and water quality monitoring
Crowdsourcing emerges as a cost-effective tool for the data collection on air quality, especially enabling a broader spatial coverage and increased temporal resolutions as compared to traditional regulatory-grade monitoring.PurpleAir, a very affordable, accessible, and easy-to-use low-cost air sensor network, has become one of the most popular and largest crowdsourced networks worldwide [40].The dense network is built with the help of stakeholders such as residents, environmental and public health agencies, and university researchers to measure real-time particulate matter (PM) at residential areas, industrial facilities, schools, and various other places of interest [41,42].For example, the PurpleAir sensor data are also integrated into AirNow, a one-stop source of air quality platform developed by major agencies, including the US Environmental Protection Agency (EPA), to provide historical, current, and future air quality data, especially during wildfire seasons [43].Similarly, "AirVisual, " developed by IQAir, also engages citizen scientists to collect various pollutants (e.g., PM, ozone, and nitrogen dioxide) mainly through indoor (e.g., home, office, and hospital) and mobile (e.g., travel) applications [44].SmartCitizen offers a kit with open-source files and schematics, enabling users to customize their air measurement needs through citizen science [45].Clarity monitoring network enhances traditional air monitors by incorporating solar panels to sustain internal batteries and enable the measurement of black carbon [46].Additionally, Air Quality Egg aims to empower K-12 students to become citizen scientists by measuring multiple pollutants, including carbon monoxide and sulfur dioxide [47].Overall, crowdsourced platforms emerged as integrated systems for short-term and longterm community-based air quality monitoring networks, enabling the public to measure and report real-time crowdsourced data.Importantly, the crowdsourced air quality data could be combined with satellite remote sensing data, meteorological data, noise data, and other smart-phone applications to revolutionize environmental exposure assessment, disaster resilience, urban planning, environmental justice, and epidemiological studies [48,49].
Crowdsourcing has also become increasingly valuable in water quality monitoring, engaging citizen scientists to collect data on a large scale.For example, the Secchi disk, a plain white, circular disk with a diameter of 30 cm, is commonly used for measuring water transparency or turbidity.Citizens, scientists, nongovernment organizations, and other stakeholders have employed this tool to increase networks of in situ measurements mainly in oceans [50] and lakes [51,52], supplementing traditional monitoring networks.Particularly, the North American Lake Management Society has held annual crowdsourcing events-"Secchi Dip-In"-since 1994 to engage lake enthusiasts and volunteers to contribute to a comprehensive database of water quality [53].State agencies also encourages the use of crowdsourced air quality monitoring to supplement routine water monitoring activities, such as the Citizens Statewide Lake Assessment Program [54] and the Clean Water Team under the Surface Water Ambient Monitoring Program [55].Volunteers are trained to collect samples and measure various parameters, such as dissolved oxygen, pH, and nutrient levels to characterize water quality.Additionally, environmental charity such as Earthwatch Europe has launched a global citizen science project-FreshWater Watch in 2012.This initiative engages volunteers in using the same standardized research method to monitor various water bodies on a broader scale and to identify regional and global trend, including rivers, lakes, streams, ponds, and wetlands [56].The Surfrider Foundation's Blue Water Task Force is a crowd-sourcing initiative focused on monitoring water quality at beaches and coastal areas to protect public health and advocate for clean water policies [57].These example programs commonly value the advantages of crowd-sourcing data in water quality monitoring.By leveraging citizen science and engaging the public to collect data on parameters essential to understanding water quality, the crowdsourced database contributes to enhance spatial and temporal coverage, expand types of waters, support decision-making, and inform efforts in water resource management and conservation.

Natural hazards and disaster response
Natural hazards like flash floods or earthquakes usually take place in a short period, while their impacts vary by communities and individuals and depend on local context, e.g., hazard conditions, natural and built environment, socioeconomic characteristics, management strategies, and responding behaviors.Therefore, Downloaded from https://spj.science.orgon January 22, 2024 timely and hyperlocal observations of natural hazards and affected communities are necessary to understand the impacts of natural hazards and formulate immediate, actionable disaster response that can minimize loss of lives, property damages, and social and environmental disruptions.Harnessing crowdsourced geospatial data from social media platforms, smartphone apps, or crowdsourcing websites offers a novel avenue for observing short-term, localized events with exceptional spatial and temporal resolutions [58].Several initiatives have adeptly harnessed crowdsourced data to observe natural hazards and their impacts, as well as supporting disaster responses.
For example, the United States Geological Survey (USGS) developed the "Did You Feel It?" (DYFI) website in 1999 to gather information about the impacts and effects of earthquakes from people who have experienced them.When an earthquake occurs, individuals in the affected area can visit the DYFI website and provide details about their experience, including their location, the level of shaking they felt, and any observed effects, such as swaying of buildings, rattling objects, or other impacts.This information is then aggregated and displayed on an interactive map, allowing users to see how the shaking was perceived across different areas.Crowdsourced DYFI data have been demonstrated valuable in better understanding the spatial distribution of shaking intensity, which can be used to refine earthquake hazard maps [59], evaluate the performance of buildings and infrastructure, and improve earthquake engineering practices [60].
Another notable example is Ushahidi (meaning "testimony" in Swahili), an open-source platform that facilitates crowdsourcing, mapping, and data visualization for hazardous events or crises.This platform empowers both individuals and organizations to collect reports that elucidate local damages and identify individuals needing help during natural hazards.These reports are derived from diverse sources like text messages, social media posts, emails, and web forms [61].The reports are then aggregated and displayed on an interactive web map to coordinate disaster response missions.Since its inception in 2007, Ushahidi has been employed in over 90,000 cases, received more than 6.5 million reports from 160 countries, and played a pivotal role in several disaster response tasks, such as searching and rescuing victims during the 2010 Haiti Earthquake [62].
CrowdSource Rescue (CRS) is a similar platform that uses crowdsourced geospatial data to bolster disaster response.It was established in 2017, prompted by the unprecedented flooding caused by Hurricane Harvey in Houston and the surrounding areas.During Harvey, many people failed to evacuate on time while the 911 system was overloaded [63].As a result, flood victims turned to online platforms (e.g., Twitter and Facebook) and volunteer groups (e.g., Cajun Navy) to seek help [64].CRS emerged as a solution to aggregate these help requests through a crowdsourcing methodology and present them via an interactive WebGIS platform.This platform enables victims to submit help requests and allows certified volunteers to access the geographical locations of individuals seeking help and extends assistance accordingly.As of 2023, CRS has engaged over 13,513 rescuers and volunteers, facilitating aid to more than 94,036 survivors across 28 hazardous incidents.
In addition, disaster-affected regions with limited geospatial data and experts can benefit from geospatial data crowdsourced by cartography professionals or volunteers.The Humanitarian OSM Team (HOT) is a pioneering and collaborative effort that leverages crowdsourced geospatial data to co-produce mapping products for decision-making in disaster response.Using OSM, cartography professionals or volunteers can contribute to satellite image digitization and mapping roads, buildings, rivers, and essential features in disaster-affected regions with limited mapping resources.This collective endeavor yields comprehensive and up-to-date geographical data that significantly enhance disaster response efforts by supporting the assessment of fundamental infrastructure (e.g., transportation and shelters), identification of resource availability, delineation of impacted zones, evaluation of damages, and estimation of affected populations.HOT's establishment was prompted by the 2010 Haiti Earthquake, and its contributions have been demonstrated in various natural hazard events, including the 2015 Nepal Earthquake [65].

Land cover and land use
Citizen science has emerged as a powerful tool for collecting and analyzing data across a wide range of disciplines, including land cover and land use studies.The integration of citizen science with Earth observation has the potential to provide valuable calibration and validation data, covering a diverse set of fields from disaster response to environmental monitoring.This integration has yet to be fully exploited, and there is a significant opportunity for citizen science to contribute to the achievement of the United Nations Sustainable Development Goals, including those related to land use and land cover [66].
Several projects have demonstrated the potential of citizen science in this domain.For instance, the Geo-Wiki project has leveraged a global network of volunteers to improve the quality of geographic data by validating and correcting existing land cover maps [67].This project has harnessed the collective efforts of a global network of volunteers to enhance the quality of geographic data, primarily through the validation and correction of existing land cover maps [67,68].The tool has found particular utility in Central Europe, where it has been extensively employed to improve land cover maps [69].Similarly, the LUCID project has engaged participants in identifying types of land cover in satellite images to monitor land use changes over time [70].The Missing Maps project has mobilized volunteers to use satellite images to map areas that are home to vulnerable populations but are poorly covered in existing maps.These projects, among others, have shown that citizen science can provide high-quality data for land use and land cover classification tasks, and that local knowledge and professional background have a minimal impact on volunteer performance in these tasks [66].Additionally, these ground-based observations frequently provide essential datasets for training machine learning algorithms or for developing mapping rules, thereby enhancing the interpretation of satellite imagery and improving the accuracy and dependability of land cover classifications obtained from satellite data [71].
The integration of citizen and community science into land cover, land use, and land change detection processes has been explored in various contexts.For instance, Olteanu-Raimond et al. [72] proposed an experimental framework for integrating citizen science into a national mapping agency.Theobald [73] put forth a general-purpose spatial survey design for collaborative science and monitoring of global environmental change.In addition, Kolstoe et al. [74] leveraged citizen science to study the differential impacts of climate and land cover on bird populations in the Pacific Northwest.In a similar vein, Whitehorn et al. [75] utilized a decade of citizen science observations to investigate the effects of climate and land use on British bumblebees.Downloaded from https://spj.science.orgon January 22, 2024 These and other projects demonstrate the diverse applications of citizen science in monitoring and understanding changes in land cover and land use.

Urban planning and infrastructure
Data on the infrastructure in the built environment, such as buildings, roads, amenities, and public open spaces, is crucial for urban planning and supporting livable and smart cities.The acquisition and management of such data has traditionally been in the realm of governments, but the crowdsourcing route has been gaining momentum in the past decade with increasing completeness and application of such data in academia and practice.Further, the surge of sensors and the volume of user-generated geographic information in cities introduced new means to collect information on the built environment.For example, recent crowdsourced datasets that have been gaining attention in the built environment are real estate ads and accommodation reviews to collect information on building characteristics [76,77,78].
OSM, the open and collaborative map of the world that is built by a community of mappers, is perhaps the most relevant instance of crowdsourcing in this context [79].The community contributes and maintains data about roads, trails, cafes, railway stations, etc., with the content reaching high levels of quality globally.For example, currently, there are more than half billion buildings mapped around the world, reaching full completeness, high quality, and rich semantic information in many urban areas around the world [80,81].Taking advantage of this trend, data on a variety of features sourced from OSM have been used for myriads of purposes in urban planning, supporting, e.g., modeling urban change and characterizing the urban form, real estate analyses, and population studies [82,83,84,85].Considering the continuous popularity of OSM and the growing role of corporate editors [86], i.e., companies that contribute data to OSM, the platform is expected to remain very relevant and instrumental in the crowdsourced mapping of the urban infrastructure and its applications in urban planning.
Mapillary, a platform that manages crowdsourced imagery to create a visual representation of the world for improving maps [87], is another increasingly popular instance of crowdsourcing spatial data in the built environment as street-level imagery plays an important role in urban planning and infrastructure management.Despite being a relatively new type of spatial data, because of offering a new perspective and other advantages such as dense coverage, street-level imagery has rapidly gained attention for a variety of use cases in the built environment.Initially, use cases have been dominated by data provided by commercial services such as Google Street View and Baidu Maps, with many demonstrated uses of it for mapping infrastructure and supporting urban planning, such as collecting information on buildings, assessing perception of streetscapes, and mapping greenery [88,89,90].However, recent years have seen a growing use of Mapillary for similar purposes, offering an alternative at a liberal license and with various advantages such as imagery taken from bicycles and in open public spaces, owing to the heterogeneity of contributors, and offering dense coverage in urban areas that are often not available in commercial counterparts.For example, Mapillary has been used to extract detailed information of buildings to generate three-dimensional (3D) building models at high level of detail, map networks of bicycle paths, and measure greenery along roads [91,92,93,94].While Mapillary remains the most popular crowdsourced platform for street-level imagery, there are alternative services that are more popular in particular regions, e.g., KartaView, which is particularly focused on Southeast Asia.

Astronomical observations
Astronomical observations, as part of the emerging concept of "Citizens as Sensors, " leverage the potential of crowdsourcing for capturing celestial phenomena and contributing to a global dataset.This active data collection, engaged by voluntary contributors worldwide, provides a unique opportunity to gather scientific data on a scale that would be otherwise impossible with traditional methods.The "Globe at Night" and "Aurorasaurus" projects, two remarkable examples of this burgeoning field, harness the enthusiasm and curiosity of citizen scientists to contribute to our understanding of the cosmos.
"Globe at Night" is an international citizen science project inviting global participants to measure and submit observations of their night sky's brightness.It aims to construct a global dataset of light pollution, a growing concern that affects astronomical observations and the natural world.This initiative provides a cost-effective method for obtaining comprehensive geospatial data on light pollution, with citizen scientists worldwide acting as "sensors" in their locales.The protocol is simple: Participants engage in a "star hunt" during moonless nights, recording the faintest star visible to them.These data are then submitted along with the date, time, and location, contributing to a worldwide light pollution map.From 2006 to 2019, this project accumulated over 190,000 data points with more than 200,000 measurements from 180 countries, providing a rich dataset utilized in numerous domains, including city ordinances, school science projects, and monitoring conditions near observatory sites.The participatory nature of the project encourages public awareness about light pollution, turning citizen scientists into advocates for darker skies [95].
Similarly innovative, the "Aurorasaurus" project crowdsources observations of the aurora (both positive and negative), providing real-time data that can assist in forecasting these extraordinary events.The utility of this initiative was demonstrated during a significant geomagnetic storm event when an unprecedented number of sightings were reported, illustrating the platform's potential for large-scale, real-time data collection on auroral activity.Like the "Globe at Night, " this project serves dual purposes: collecting observations and fostering public understanding of auroras and space weather.The process of submitting observations involves detailing auroral activity, color, and height in the sky, often accompanied by a photograph.The platform further enriches its dataset by combing social media (e.g., Twitter) for likely aurora sightings [96].
Both "Globe at Night" and "Aurorasaurus" are emblematic of the potential of crowdsourcing in astronomical observations.By capitalizing on the enthusiasm of the public and the ubiquity of smart devices, these projects collect data on a scale that would be unfeasible through traditional means.Moreover, they turn every participant into an advocate for scientific understanding, fostering a deeper appreciation for the natural world.In this sense, they embody the essence of gamification described by Ahlqvist and Schlieder [97], transforming a mundane data collection exercise into an engaging and enjoyable activity.While this session has mainly focused on these two projects, the vast potential and applicability of crowdsourced astronomical observations is worth noting.The combination of citizen science and geospatial data collection represents a powerful tool for scientific discovery that is only beginning to be fully realized.

Geo-games and gamification
Gamification is an emerging strategy used by cities to promote, persuade, invite, engage, and educate people through gamebased approaches (e.g., geo-games and geoplay) that include a geospatial element for data collection, validation, and analysis of geo-information.With minimal cost, the crowdsourcing strategy utilizes gamification approaches to carry out activities by incorporating game elements that provide meaningful results for further analysis.Ahlqvist and Schlieder [97] provide detailed elaboration on spatial gamification from both within and outside the realm of geo-information science, highlighting its applications in education, spatial planning, tourism, product marketing, and other areas.Augmented reality games or gamified apps have been developed to encourage environmental exploration or reward users for documenting specific environmental features.Major geospatial organizations, such as OSM, WikiMapia, DigitalGlobe-tomnod, GeoWiki, and Zooniverse, have reported the gamification of their work related to managing innovation processes over the past decade [97].Several important features define gamification: first, the focus on fun as the primary element to ensure an enjoyable game experience; second, the emphasis on collective intelligence to avoid information imbalances, where players generate information explicitly through commenting or rating, as well as implicitly without realizing it; and third, the utilization of stable architectures and gamification systems that obtain geographic information through people's activities, such as check-ins, place registrations, and message postings.
Building on the three aforementioned features, several typical examples of geo-games have been developed and are increasingly popular in the field.For instance, SocialVenue [98] is a location sharing app designed to facilitate communication among users through location sharing features.Mag-ike (magic bike) is a biking game developed by a research team in Spain [99] with the aim of gathering crowdsourced commuting data.It employs a multi-cache approach, providing daily reports with accumulated scores and game status to help players improve their results.Gezgin is a geo-game application [100] developed to evaluate the benefits, values, and skills related to the global connections learning area of the social studies curriculum in Turkey.It is based on the four-component instructional design (4C/ID) model and incorporates expert opinions.NavApps, designed by Geotech, a research team in Germany [101], aims to raise awareness among high school students about their surrounding locations and educate citizens about existing services in smart cities, such as traffic conditions.Additionally, some games have been involved in evaluating the cultural and historical significance of cities.For example, Pokemon GO is primarily used to identify tangible attributes and values from textual descriptions, while Minecraft is a 3D block-building geogame developed for (re) designing buildings, cities, and landscapes [102].These games have gained popularity and attracted a large online community of players who contribute to the creation and adaptation of worlds, fostering autonomy, 3D and spatial awareness, creativity, and social interactions.They serve as emerging approaches for crowdsourced data generation and collection.

Crowdsourcing Human Observations
In the intricate landscape of human experiences and societal complexities, the methodology of crowdsourcing human observations has emerged as a pivotal instrument for achieving unparalleled granularity and scalability in data collection.By facilitating the contribution of real-time information from individuals, this model is effecting transformative changes across diverse sectors, ranging from healthcare to public safety.This participatory framework extends beyond mere data accumulation to actively involve communities, thereby amplifying voices that might otherwise remain marginalized.In the forthcoming sessions, we will engage in a comprehensive exploration of the various dimensions of crowdsourcing human observations, examining its seminal impact on areas such as health and wellness, the optimization of transportation systems, and the realtime capture of public sentiments and opinions.Furthermore, we will discuss the utility of this approach in the arenas of disaster management and response, the enhancement of physical and social connectivity, as well as the fortification of public safety and security protocols.Each specialized session aims to furnish attendees with nuanced understandings of the interplay among technological, ethical, and societal considerations, thereby offering a balanced view of both the benefits and limitations inherent in leveraging collective intelligence.The structure of our review on crowdsourcing human observation is presented in Fig. 3.

Health and wellness
Crowdsourcing technology has played a pivotal role in advancing health and medical research, providing enormous opportunities to transcend geographical and organizational barriers faced by traditional research processes.One of the most notable benefits of this technology in addressing complex medical and public health issues is the ability to accelerate the collection of health-related data from a large number of individuals across various geographic locations and demographic groups [103].Powered by electronic/mobile health (e/m-health) and wearable sensor technologies, a wealth of individual-level geospatial data has contributed to making groundbreaking discoveries that would otherwise be impossible due to the large number of research participants required for data collection [104,105].For instance, Mappiness-a mobile application developed to collect large-scale geo-referenced data on subjective well-being-has allowed researchers to understand the effect of environmental aesthetics on happiness at a population level by capturing the spatiotemporal variability in happiness experienced in a wide range of environments [106].The realtime nature of these data has been proven to significantly reduce recall bias since researchers no longer have to rely on people's recollections of their feelings and locations.Similarly, crowdsourced disease surveillance platforms that enable realtime geospatial data collection, such as Mosquito Alert and Flu Near You, have become an essential means for open collaboration between public health professionals and the general public to identify geographic hot spots of infectious disease and implement timely interventions [107,108].
Due to the principle of the self-selected sample in the crowdsourcing health data, however, particular attention should be paid to the characteristics of the population generating the data and the possibility of under-or over-representation of certain population groups [109].Some researchers thus suggest that crowdsourcing approaches are best suited for studies in which rapid data collection from a large group of people is crucial but a representative sample is not necessary [110], such as Qin et al. 's [111] geo-crowdsourcing application developed to offer people with mobility or visual impairments real-time information on the locations of navigation obstacles.However, when executed carefully, crowdsourcing Downloaded from https://spj.science.orgon January 22, 2024 approaches hold significant potential for addressing health disparities by facilitating engagement with underrepresented communities.Park et al. [112] utilized GeoAir2, a portable air sensor that does not require technical proficiency in users or a local Wi-Fi network for data collection-which often hinder underserved groups' participation [113]-to ensure the inclusion of low-income immigrant communities in real-time air quality monitoring and empower them to take data-driven action.These examples demonstrate that crowdsourcing will remain increasingly a powerful tool for addressing public health challenges as e/m-health and participatory sensing technologies continue to evolve.

Transportation
Transportation management is rapidly evolving into a datadriven discipline.The integration and analysis of data have become essential for enhancing efficiency, safety, and decisionmaking in transportation systems [114,115].The advancement of Internet of Things (IoT) technologies and the widespread use of mobile devices have facilitated the active contribution and passive collection of vast amounts of traffic information from various road users [116,117].This has opened new possibilities for cost-efficient solutions in transportation monitoring and management.Crowdsourced observations have gained widespread adoption in different transportation management tasks [118,119,120].This review will primarily introduce two key applications of crowdsourced data in transportation studies: traffic volume estimation and road safety assessment.
Traffic volume is a critical component of transportation planning and management.Traditionally, traffic volume data were collected through stationary sensors or manual surveys, which are not only expensive but also limited in their spatiotemporal coverage [121].Many studies have shown that speed patterns on roadways are closely related to volume patterns, making them valuable for estimating the volume of road segments not covered by traffic sensors.Researchers can potentially estimate traffic volume for extended areas by leveraging speed patterns from crowdsourced Floating Car Data (FCD) [122,123].FCD refers to real-time or near-real-time driving information collected from individual vehicles moving through the road network [122].Common sources of FCD include phone-based navigation apps (e.g., Google Maps, Waze, and HERE WeGo), connected vehicles, probe vehicles, and car-sharing and ride-hailing platforms 5 years of crash data to obtain statistically reliable results.Moreover, using crash data alone may lead to an underestimation of traffic risk as many unreported crashes, incidents, and near-miss events are not captured [126].To address these limitations, an increasing number of studies are exploring the potential of surrogate safety measures, such as traffic conflicts and abnormal driving behaviors, in road safety assessments (e.g., identifying traffic blackspots) instead of relying solely on crash data [122,127].With the advent of mobile sensing techniques, a vast volume of hazardous driving behaviors (e.g., hard braking, fast acceleration, and frequent lane changes) can be detected using crowdsensing solutions through phone-or vehicle-based sensors.Different crowdsourced driving behaviors, such as hard braking [128], driving jerks [129], and speed variations [130], have been proven to be strongly correlated with crash risks.Additionally, people can also actively report traffic incidents through mobile apps.For instance, Waze, a leading crowdsourcing platform, can efficiently collect traffic incidents reported by its registered users, proving to be a valuable data source for road safety assessment.By combining Waze-captured incidents with historical crash records, a more comprehensive understanding of traffic risks can be achieved [131].
In addition to applications in traffic volume estimation and road safety assessment, crowdsourced data have also found successful use in various transportation management tasks, including traffic congestion detection [121], road surface assessment [119,132], transport asset management [133], and transportation planning [134].These applications demonstrate the great potential of crowdsourced information in supporting the establishment of intelligent transport systems.

Public emotions, sentiments, and opinions
In comparison to traditional techniques such as surveys and questionnaires, crowdsourcing is particularly effective in studies that necessitate broad spatiotemporal coverage and involvement of large population sizes.One of the major arenas where crowdsourcing shines is in the extraction and interpretation of public emotions, sentiments, and opinions.The rapid advancements in natural language processing and deep learning techniques have substantially enhanced the application of crowdsourced data in sentiment analysis [135,136].Social media, offering an extensive, continually updated, and diverse array of usergenerated content, serves as a rich data source for sentiment analysis.Platforms like Twitter, Flickr, Weibo, and Facebook are particularly significant, as sentiment analysis techniques applied to data from these platforms have found extensive usage across various disciplines, providing invaluable insights that shape strategic decision-making [137,138,139,140].
The application of sentiment analysis using social media data spans across multiple areas.In healthcare, it helps in monitoring public attitudes toward healthcare policies, tracking disease outbreaks, and understanding the social and psychological impacts of various health conditions [141,142,143].In politics, it is used for predicting election outcomes by assessing public sentiment toward general elections and users' reaction to political campaign [144,145,146].Furthermore, the technique has been employed to analyze public perceptions of the urban built environment, thereby informing urban landscape planning [147].In the field of marketing, sentiment analysis plays a crucial role in understanding consumers' or employers' sentiments toward products and brands, informing strategic marketing decisions [148,149].
Within the realm of public health, the application of sentiment analysis using crowdsourced data, particularly from social media, has been particularly impactful.For instance, Broniatowski et al. [150] leveraged Twitter data as a surveillance tool to concurrently monitor influenza cases and the related public reactions.Similarly, Ahmed et al. [151] adopted a similar approach during the 2009 H1N1 pandemic, using Twitter data to dissect public sentiment and responses.Further enhancing such approach, Müller and Salathé [152] introduced "Crowdbreaks, " an open-source platform that streamlines this process with crowdsourced labeling, enabling efficient sentiment analysis of health trends in real time, thereby accelerating pace of research in the public health domain.
The recent worldwide coronavirus disease 2019 (COVID-19) pandemic saw a significant application of a similar approach in understanding public sentiment related to the disease and its respective vaccines.Several researchers, including but not limited to Ibrahim et al. [153], Hussain et al. [154], and Hussain et al. [155], have demonstrated the value of sentiment analysis using social media data.Ibrahim et al. [153] built a Hierarchical Twitter Sentiment Model to discern sentiment polarities within COVID-19-related tweets, while Hussain et al. [154] extended the use of artificial intelligence (AI) techniques to analyze over 300,000 social media posts from Facebook and Twitter about COVID-19 vaccines in the UK and the US.Complementing this, Hussain et al. [155] conducted a parallel analysis of posts discussing adverse effects following immunization, effectively underscoring the utility of social media analysis as a supportive mechanism in traditional pharmacovigilance.This broadened usage of crowdsourced data in sentiment analysis signifies its critical role in comprehending and navigating public sentiment during health-related crises.

Disaster management and response
The real-time nature of crowdsourced data enables rapid response to dynamic situations, mitigating the impacts of disasters.It leverages the collective intelligence and capabilities of diverse individuals, tapping into a wide range of knowledge and experiences to offer a holistic view of disaster situations.Importantly, crowdsourcing democratizes disaster management by empowering communities to contribute to response efforts, fostering resilience at a grassroots level.
A number of applications and tools have been developed to harness the power of crowdsourcing in disaster management and response.For example, Ushahidi is an open-source platform designed to crowdsource crisis information, visualizing these data on a map to provide situational awareness during disaster events.Similarly, HOT uses crowdsourcing to generate geographic data, aiding relief organizations in their operations.During the 2010 earthquake in Haiti, Ushahidi was used to collect and map incidents of collapsed buildings, trapped individuals, and medical emergencies.This allowed emergency services to prioritize their resources effectively [156].Meanwhile, there was an immediate need for high-quality maps of the affected areas to aid in rescue and relief operations.The HOT used available satellite imagery to map the affected areas and a virtually blank map was transformed into a detailed spatial dataset in a short time.FloodCrowd is a UK-based citizen science project that crowdsources flood reports from the public.Using a simple online form, anyone can report flooding incidents, providing key details, such as the location, timing, and impact of the flood.The collected data are then used to improve the understanding and modeling of flood risks.Many other tools, such as Crisis Downloaded from https://spj.science.orgon January 22, 2024 Cleanup and Safecast, also illustrate the transformative role of crowdsourcing in optimizing disaster management strategies and actions.
In addition to the tools above, researchers often use data from social media platforms (e.g., Twitter, Weibo, and Facebook) to gain situational awareness and improve disaster response [157].The gathered data are analyzed using Geographical Information System (GIS) and natural language processing techniques to extract useful information such as sentiments, needs, and locations of affected individuals.For example, Ashktorab et al. [158] developed Tweedr, an architecture for collecting and analyzing Twitter data to identify actionable information for disaster response.Tweedr was shown to effectively mine disasterrelated information from the vast amount of data on Twitter.De Albuquerque et al. [159] proposed an approach for integrating social media data with authoritative data for improved disaster management.The authors demonstrated that social media could provide timely, geographically diverse information that complements traditional data sources.Wang et al. [160] examined the use of social media in managing flood emergencies in urban areas, focusing specifically on the 2012 Beijing rainstorm.They used data from Weibo, a popular Chinese microblogging site, to highlight the potential of social media in real-time information dissemination and public participation in flood management.To improve the quality of data available to emergency responders, Tien Nguyen et al. [161] developed a deep learning model to automatically filter out irrelevant images from social media during crises.These studies illustrate the valuable insights that can be gleaned from social media data during disasters, from improving situational awareness to understanding public sentiment and aiding in disaster response coordination.However, these researchers also note the challenges, such as data validity and privacy concerns, that must be addressed when using such data.

Physical and social connectivity
The emerging "Web 2.0" and "Citizens as Sensors" represent a forward-thinking concept for leveraging the potential of crowdsourcing in the accumulation of digital imprints left by users of digital devices, capitalizing on the burgeoning trend of geopositioning technologies.Both passive and active data collection means have greatly facilitated our understanding of physical connectivity, quantified by human moving patterns.Passive data collection, for instance, comprises information acquired from sources such as mobile phone GPS [162,163], smart card transactions [164,165], and wireless networks [166,167].The spatial connections originating from these passive traces often exhibit high degrees of representation, due to their broad data penetration ratios [168].This prevalence, nonetheless, prompts significant apprehensions regarding privacy and confidentiality.An alternative that offers reduced intrusion and ameliorates privacy concerns incorporates spatial data gathered from social media platforms [169,170,171].Given the active sharing characteristic inherent in these platforms, data extracted from social media sources are typically less abundant compared to passively collected GPS locations from mobile devices.The physical interlinkages derived from the abovementioned sources have been deciphered and employed for a myriad of purposes.These include transportation planning [172,173], disease modeling [174,175,176], identification of urban functional zones [177,178], disaster management [179,180], and marketing and business development [181,182].
Transitioning to social connectivity, crowdsourcing data present a rich reservoir for understanding and analyzing social interactions and patterns.One principal source of social connectivity data is online social media platforms, where usergenerated content can provide significant insights into social behavior and community dynamics.These platforms inherently encourage interaction, engagement, and social sharing, resulting in a plethora of records that, when analyzed, reveal a complex network of social relationships [183].An in-depth examination of engagement indicators, encompassing likes, shares, comments, and even the subtle nuances of language usage, offers a robust methodology for gauging sentiment [184], identifying social affiliations [185], and discovering shared interests [186].Furthermore, the applicability of social media transcends the realm of direct social engagement.It serves as an invaluable tool in tracing the spread of information [187,188], monitoring societal attitudes [189,190], and understanding online communities [191].The advent of location tagging introduces a new spatial facet to social connectivity, further enriching the depth and breadth of analytical possibilities.

Public safety and security
Traditional data that measure public safety and security are typically collected through government agencies, law enforcement organizations, research institutions, and other formal sources [192].For example, crime reports that compile crimes documented various criminal activities, incidences requiring immediate assistance collected from emergency call centers, and injuries resulting from incidents related to public safety from hospital records [192].Those types of data are typically well-established and reliable.However, the data may not be able to document dynamic changes in the cases and may not be able to be shared with the public promptly.The development of crowdsourcing data complements these limitations through a rapid and cost-effective data collection and sharing process.
Mobile phone applications are popular tools to collect and share crowdsourcing data, which can affect public safety and security.For example, Citizen app, an application that keeps users updated about nearby crimes, accidents, and emergencies in real time, sources information from police scanners and application user reports.Based on a recent online survey, 87% of the participants expressed their willingness to use the Citizen app to report accidents, crime, and corruption [193].Researchers also find that the Citizen app generates earlier notifications in traumatic cardiac arrest compared with standard Emergency Medical Service radio communications [194].The out-ofhospital information provided by the app may create a complementary source for the emergency department to make rapid resuscitative decisions for upcoming patients [194].
Crowdsourcing data can also be used to serve specific populations, for example, Stop AAPI Hate, a website that operates the nation's largest reporting center tracking acts of hate against Asian Americans and Pacific Islanders (AAPI).The website was initiated in 2020 due to the rising of AAPI hate during the COVID-19 pandemic.From 2020 March 19 to 2021 December 31, a total of 10,905 hate incidents against the AAPI community were reported through this website [195].The data collected have been used in research and reports measuring the experience related to AAPI hate incidences [196,197,198].The collection of crowdsourcing data combating public safety and security is a global effort.In Egypt, HarassMap was initiated to encourage people to report instances of sexual harassment via Downloaded from https://spj.science.orgon January 22, 2024 texting or internet reporting [199].The reports are then plotted on a map, highlighting hotspots of such activities.This initiation has inspired people in other countries to establish similar systems, for example, Safe City in India, Harasstracker in Lebanon, and Biyoya in Bangladesh.
Although crowdsourcing has the benefit of collecting timely data through novel and cost-effective approaches, it is still subject to numerous concerns and challenges.For example, the information collected might not be reliable as some of the information comes from unverified sources.Additionally, those systems might increase people's anxiety about public safety [200].Participants who reported using neighborhood apps perceived local crime rates as higher than those who do not use the apps, independent of actual crime rates [201].For example, the Citizen app mentioned above was originally released as Vigilante, a banned application in 2017, which encouraged users to develop a vigilante-style network to protect themselves from potential offenders before the police needed to intervene.The app has the potential to incite violence and put innocent people in danger.Citizens did not seem to have learned from previous experience but rather created a dangerous effort to seek an arsonist of a wildfire in Los Angeles' Pacific Palisades neighborhood through app users [202].Thus, regulation is needed to use crowdsourcing efforts legally and ethically to protect public safety and security.

Challenges in Crowdsourcing Earth and Human Observations Data quality and accuracy
Crowdsourced data can vary significantly in quality and accuracy.Contributors might have different levels of expertise, commitment, and access to high-quality recording devices.Data validation and quality control processes are essential but can be complex and resource-intensive [203].
Data quality is of paramount importance when discussing crowdsourced geoinformation data.This is because contributors vary in their levels of expertise, dedication, and access to high-quality recording devices.Furthermore, due to the anonymity of crowdsourcing platforms, there is an inherent risk of vandalism [204].Prior to utilizing crowdsourced data in experiments, applications, or projects, stakeholders typically seek to understand the quality of the data to a certain degree.Nevertheless, there are persistent challenges from various perspectives regarding data quality.
On the one hand, establishing appropriate criteria for quality assessment is challenging due to the evolving nature of crowdsourced data and the different application-specific requirements in terms of input data quality.These data might introduce novel types of information and geographic features that are not present in authoritative databases, rendering the evaluation of such new data difficult.Existing research has proposed quality criteria for crowdsourced data, which often encompass geometric, temporal, and positional accuracy, data completeness, logical consistency, and fitness for purpose [205].However, delineating precise thresholds to categorize quality-such as distinguishing between high, medium, or low quality-is problematic, given the varying perceptions of quality across different domains.
On the other hand, executing quality assessments presents its own set of challenges.In many instances, reference data (typically sourced from authoritative or commercial databases) may not be readily available due to the prohibitive costs associated with acquisition, especially at a large scale.Even when such reference data are accessible, its utility in quality assessments is often limited due to its slower update frequency relative to crowdsourced data.While some might advocate for intrinsic quality assessment methods [206], the absence of standardized quality criteria and associated methodologies introduces ambiguity into the assessment process.
In addition, new kinds of crowdsourced data may have their own set of particularities that render currently established data quality assessment procedures and standards insufficient, presenting another challenge.For example, researchers have identified that crowdsourced street-level imagery (see the "Urban planning and infrastructure" section) has a number of quality aspects that have not been foreseen in existing approaches to gauge the quality of crowdsourced geographic information and have been working on establishing a quality assessment framework that is tailored for such form of data [207].

Data bias
Crowdsourced geospatial data inherently contain biases as it relies on voluntary contributions, leading to less credible inferences compared to conclusions drawn from a randomly sampled population [208].Biases can affect the reliability, representativeness, and usability of the data derived from various factors and sources.Common categories of biases encompass spatial biases, temporal biases, demographic biases, cognitive biases, and systematic biases [208,209,210,211,212].Spatial biases in crowdsourced geospatial data occur as a result of the unequal geographical distribution of contributors or specific local characteristics of crowdsourcing tasks, causing certain regions or places to receive higher contributions due to factors such as popularity, population density, internet accessibility, and task localization [213,214,215,216].Temporal biases can arise when crowdsourcing tasks are limited to specific events or time frames, potentially distorting the comprehensive understanding of individuals' characteristics, behaviors, or opinions over time and resulting in biased insights or conclusions [217,218].
Beyond spatial and temporal considerations, demographic biases play a substantial role in influencing the representativeness of crowdsourcing participants concerning age, gender, education, socioeconomic status, culture, and other demographic attributes [212,219,220,221].These biases-often overrepresentation-tend to be more prevalent among young, male, well-educated, technologically literate, and affluent segments of the population [208,212,213,217,222,223], resulting in a lack of diversity and inclusivity in the participant pool and subsequently leading to an imbalance in perspectives and experiences.Additionally, other inherent biases, such as cognitive biases, derive from limitations in human cognitive processes, and systematic biases arise due to flaws in the data collection process, study design, or analysis methods.Both biases require attention when assessing the quality and representation of crowdsourcing geospatial data [224,225].They can manifest in varying ways, impacting the responses of participants and the decision-making of researchers and decision-makers.
Different strategies have been explored to effectively mitigate and address these data bias challenges in crowdsourcing geospatial data: first, performing data preprocessing, reweighting, or sensitivity analysis to reduce the impact of biases on downstream analysis or modelling [211,219]; second, combining crowdsourced data with authoritative geospatial sources to gain a more balanced view of the data [168,208,226]; third, designing tasks with clear guidelines and varying perspective Downloaded from https://spj.science.orgon January 22, 2024 considerations, such as relying on long-term trends and unbiased statements or questions [211,224]; fourth, encouraging a broad range of contributors with different demographic backgrounds from different regions for inclusivity and diversity [220,227]; finally, implementing quality control measures during the crowdsourcing process can help identify and filter out biased responses [225,228].This can involve prescreening workers, incorporating validation questions, or using redundancy to compare multiple worker responses.

Data privacy
Apart from inherent data biases, the use of crowdsourced data, particularly from social media or mobile devices, can raise data privacy concerns.One of the primary data privacy issues is location privacy.Geospatial information often reveals precise details about individuals' whereabouts, activities, behaviors, or even their home addresses.Without adequate safeguards, such data could be exploited by malicious actors to track or identify individuals, leading to potential risks related to personal safety and security [229,230].Furthermore, crowdsourced geospatial data might inadvertently contain personal identifiers, such as usernames or profile information, which could lead to the reidentification of individuals.Studies have shown that it is still possible to re-identify individuals through cross-referencing with external datasets [230,231,232].This poses a significant challenge as it compromises the anonymity of contributors and exposes them to potential privacy breaches.Additionally, third-party access and sharing are also data privacy challenges of crowdsourcing platforms, such as research or commercial use.This raises concerns about how these entities handle the data and whether they adhere to privacy regulations.The lack of explicit consent for data sharing may lead to unexpected data usage, emphasizing the need for transparent data sharing policies and stringent agreements with third-party partners [233,234].
It is crucial for organizations and researchers to implement robust privacy measures, uphold ethical standards, and comply with relevant data protection regulations to ensure the confidentiality and security of crowdsourced geospatial data while maximizing its potential for beneficial insights.Various techniques have been proposed to avoid violating user privacy, aiming to strike a balance between data utility and individual privacy, ensuring responsible data usage.Anonymization, aggregation, and privacy-preserving methodologies are essential strategies to mitigate location privacy risks [231,235,236], specifically anonymizing the geospatial data by removing direct identifiers, such as names and direct details, aggregating data at higher spatial or temporal resolutions to enhance privacy by obscuring specific locations, and implementing clear and transparent data sharing policies by explicating the consent mechanism for contributors.To help protect the identity of participants and minimize the risk of re-identification, techniques like pseudonymization and data encryption can be employed by enabling secure computation on encrypted geospatial data [229,233].Secure multi-party computation is also a promising approach that allows multiple parties to jointly analyze data without sharing raw information [237].

Legal and ethical issues
Using crowdsourced data inevitably raises legal and ethical issues due to the nature of the data, which is typically contributed by a diverse group of individuals.These issues include concerns about data ownership, intellectual property rights, and liability, which have been extensively discussed in the literature on citizen science [238].Unlike traditional data sources where ownership is often more straightforward, determining the ownership rights of crowdsourced data can be complex since there are usually no clear guidelines or agreements regarding ownership and usage rights.This ambiguity gives rise to concerns regarding privacy, intellectual property, and the potential exploitation of contributors' data.Data scientists must navigate this ethical and legal landscape by establishing transparent protocols, consent mechanisms, and fair compensation approaches to ensure that the rights of contributors are protected while harnessing the full potential of crowdsourced data for valuable insights and innovation.
More specifically, crowdsourced Earth observation data, such as OSM, require users to attribute the source of the data and share any derivative works under the same license [239].It is crucial to understand and comply with the licensing terms when using OSM data to avoid legal repercussions.OSM has a strong community that follows specific guidelines and norms.Ethical considerations involve respecting the principles of the OSM community, such as refraining from vandalizing or misrepresenting data, giving proper credit to contributors, and collaborating with the community to improve the dataset.In contrast, the legal and ethical issues surrounding crowdsourced human observation data, such as tweets, are even more significant.There is a growing need for universal guidelines addressing the ethics of social media research, particularly concerning the privacy and anonymity of social media users.Although social media data are often claimed to be anonymized, sharing such data via public repositories and platforms should involve discussions on obtaining consent and/or ethical approval for research purposes [240].This is especially crucial for datasets containing user profile information, as these datasets can be potentially identifiable through cross-referencing data attributes [10].While adhering to data sharing regulations and the principles of reproducibility, it is important to approach the sharing of processed social media data via public repositories and platforms with caution and establish reproducible workflows that can be utilized by end-users without a coding background.

Sustainability of data collection
The sustainability of data collection in crowdsourcing initiatives can be a significant challenge, because it heavily relies on the active participation and engagement of volunteers.As noted by Newman et al. [241], the sustainability of such efforts can be compromised when volunteers lose interest, leading to a drop in data input, and consequently affecting the efficacy and validity of the gathered data.
Moreover, sustainability is also influenced by the nature of the community driving the project.The absence of ongoing community involvement can contribute to this dwindling interest, leading to sporadic data collection that lacks consistency and continuity.According to Starbird and Palen [242], the effectiveness and sustainability of crowdsourcing efforts, especially during crisis situations, can be significantly improved with the presence of dedicated coordinators who can motivate volunteers, manage and direct efforts, and ensure that data collection continues in an organized and systematic manner.
Institutional support can also play a crucial role in the longterm sustainability of crowdsourcing initiatives.With the necessary resources and funding, institutions can maintain motivation and engagement among volunteers through incentives, training, Downloaded from https://spj.science.orgon January 22, 2024 and recognition of efforts.This can ensure the continued flow of data and enhance the sustainability of the project over time [243].
Therefore, sustainable crowdsourcing efforts, especially in terms of data collection, need strategic planning, community engagement, and strong institutional support.These factors can ensure the continuation of volunteer participation and data collection in prolonged periods, thus ensuring the effectiveness of crowdsourcing initiatives.

Data interpretation
The process of interpreting crowdsourced data, especially when employed for scientific research, is fraught with complexities attributable to the diverse nature of the data and the potential dearth of metadata.The task of extracting consequential insights from crowdsourced data, as applied to Earth and human observations, is underscored by the substantial challenge of data interpretation.Despite the surge in accessible information (e.g., OSM) and the evolution of sophisticated tools designed to manage these data (e.g., OSM Analytics Tool), the endeavor of unraveling the salient meaning and implicit subtleties within the data poses a demanding task.
The process of interpreting crowdsourced data necessitates a meticulous traverse through a multifarious landscape of informational noise [244,245].Data acquisition, a composite process entailing the collection from an extensive range of sources, each varying in their level of expertise, precision, and consistency, often culminates in datasets marked by heightened complexity and diversity.This inherent heterogeneity, while advantageous to crowdsourcing, amplifies the task of isolating accurate, germane signals amidst an expanse of potentially discordant or erroneous data.The task of data interpretation is further intensified by the intrinsic subjectivity associated with human observations.This set of data, commonly influenced by personal biases [246], perceptual variations [247], and undulating levels of comprehension and expressive proficiency among contributors [66], can exert considerable influence over the final output.The dearth of a robust system to temper these variables could elevate the likelihood of data misinterpretation, potentially leading to skewed deductions and misplaced strategic decisions, thereby emphasizing the need for rigorous analytical approaches in the scientific processing and interpretation of crowdsourced data.
Addressing spatial and temporal variations is a critical aspect in the interpretation of crowdsourced data, particularly for earth and human observations.There can be notable fluctuations in the quality and frequency of data across distinct geographical areas and over varying time periods, thereby presenting significant hurdles in synthesizing a holistic and globally representative interpretation [248,249].These inconsistencies mandate thorough attention and the employment of advanced analytical methodologies to enable trustworthy interpretations.Moreover, the absence of uniform protocols for data validation and verification intensifies the complexities involved in data interpretation [104].Yet, the formulation and execution of such protocols pose significant challenges, especially considering the characteristically decentralized and often anonymized nature of crowdsourcing initiatives.
Responding to these challenges necessitates the adoption of inventive and rigorous methodologies for data management, analysis, and interpretation.We argue that emphasis should be placed on the evolution of more advanced machine learning algorithms, capable of filtering and standardizing crowdsourced data.This should occur in tandem with the application of robust statistical approaches designed to address inherent biases and discrepancies within the data, aiming to rectify any embedded biases and discrepancies within the data, thereby ensuring that subsequent interpretations of the data retain their validity and accuracy.

Training and education
In the realm of crowdsourced data collection, it is crucial to provide proper training and guidelines to volunteers to ensure the quality and consistency of the collected data.Data collectors, who are often volunteers, play a significant role in crowdsourcing initiatives by contributing their time and efforts to gather valuable information.However, without adequate training, the data collected may vary widely in terms of accuracy, completeness, and adherence to predefined standards.Data scientists should establish comprehensive training programs that equip volunteers with the necessary skills, knowledge, and understanding of the data collection process.This includes educating them about specific data requirements, providing clear instructions on data collection techniques, and familiarizing them with any relevant tools or technologies.An emerging trend in this domain is the use of robots to cope with data processing, such as the development of Roboturk, a crowdsourcing platform for robotic skill learning through imitation [250].However, it should be noted that training robots involves different requirements and infrastructure compared to training human workers.Regardless of the subjects involved in data collection and manipulation, data scientists should be prepared for unexpected outcomes, as design choices in data collection can have a significant impact on the quality of crowdsourced user-generated content [251].
To achieve better results of training and education in data collection and manipulation, several key steps can be considered to ensure their effectiveness and the quality of the collected information.First, we need to begin by clearly defining the objectives and requirements of the data collection project.This includes specifying the type of data needed, the desired format, and any specific guidelines or standards to be followed.Second, there is a need to create comprehensive training materials that cover all aspects of the data collection process.These materials should be accessible, be easy to understand, and provide stepby-step instructions, including visual aids, examples, and realworld scenarios to facilitate learning.Third, it is important to offer opportunities for volunteers to gain hands-on experience by conducting practice data collection exercises.This can be done through simulated scenarios or by providing sample datasets for practice.Volunteers are encouraged to seek feedback and address any questions or concerns they may have during this practice phase.It is crucial to organize training sessions where volunteers can learn directly from data experts or experienced team members.These sessions can be conducted in person, through webinars, or using online platforms.Fourth, the introduction of data quality control measures is necessary to ensure the reliability and consistency of the collected data.This can involve periodic reviews, validation checks, or random audits of the data collected by volunteers.Meanwhile, we should provide feedback and constructive suggestions to help volunteers improve their data collection techniques.Fifth, it needs to have a supportive and collaborative environment where volunteers can share their experiences, ask questions, and learn from one another, through establishing communication channels, such as discussion forums or chat groups, where Downloaded from https://spj.science.orgon January 22, 2024 volunteers can interact and seek guidance from data experts or project coordinators.Finally, offering regular training and support throughout the data collection process could ensure that volunteers receive regular updates and refresh sessions, and address any issues or challenges that arise during the process.By following these steps, data scientists can effectively train volunteers in collecting crowdsourced data, ensuring a high level of quality, consistency, and adherence to project requirements.

Future Directions and Pathways
Harnessing the power of the crowd: Expanding the scope of geospatial crowdsourcing Navigating the evolving landscape of crowdsourcing geospatial data collection and analysis reveals transformative perspectives.These include harnessing the temporal dimension, leveraging advanced AI and machine learning, integrating IoT technologies with crowdsourcing, and prioritizing inclusivity, particularly from underrepresented regions such as the Global South.We believe that the amalgamation of these insights is poised to significantly reshape our methodologies, enriching our understanding of the world through a comprehensive and representative approach to geospatial crowdsourcing.We illustrate these four perspectives in detail below.

Embracing the fourth dimension
Presently, a significant portion of crowdsourcing initiatives in geospatial data accumulation predominantly concentrates on static data.Nevertheless, prospective endeavors possess the capability to transcend conventional limitations by integrating the fourth dimension: time.By assimilating this temporal aspect more proficiently, geospatial crowdsourcing can expedite realtime or near-real-time data collation and evaluation.This dynamic strategy has the potential to enhance our competency in cultivating a more exhaustive and nuanced comprehension of our environmental milieu.Furthermore, it capacitates timely reactions to emerging circumstances and challenges, thereby fostering more informed decision-making processes and proactive initiatives.The incorporation of this temporal facet into geospatial crowdsourcing broadens the spectrum of potentialities and empowers us to harness the collective intelligence of the masses to stimulate consequential and impactful results.

Deepening the wisdom of crowds
The intensification of collective intelligence in geospatial crowdsourcing signifies a compelling venture to exploit avant-garde AI and machine learning methodologies.Utilization of these state-of-the-art technologies empowers the extraction of more intricate and sophisticated insights from the amassed data, thereby augmenting traditional analytical frameworks.AI and machine learning algorithms harbor the capacity to reveal latent patterns, associations, and tendencies inherent in geospatial data, facilitating an enhanced comprehension of our environment.These methodologies can supplement human potentialities by processing extensive quantities of data with expedience and efficiency, discerning intricate spatiotemporal patterns, and offering predictive analytics.By amplifying collective intelligence through AI and machine learning, we can unfetter unprecedented layers of comprehension and catalyze innovative solutions in geospatial analysis.Ultimately, this contributes to the refinement of decision-making processes and promotes sustainable development.

Seamless integration of IoT and crowdsourcing
The advent of the IoT offers an extraordinary opportunity for seamless integration with crowdsourcing initiatives.There lies tremendous potential in amalgamating sensor data emanating from diverse sources with crowdsourced information to furnish a more enriched, comprehensive depiction of our planet and human perceptions.This multifaceted integration not only optimizes the capacity to acquire extensive datasets but also enhances the depth of analysis by incorporating the vastness of sensor-based IoT data.This convergence of technologies empowers us to derive a finer granularity of insights and, ultimately, a more robust understanding of the patterns and processes shaping our world.Consequently, the synthesis of IoT and crowdsourcing technologies signifies an innovative stride toward more comprehensive and informed decision-making, fostering a proactive approach in our interactions with the environment.

Encouraging citizen science in the global south
Momentous efforts need to be marshaled to invigorate participation from areas that are currently underrepresented, particularly the Global South, within the sphere of crowdsourcing sciences.The adoption of an inclusive strategy for data collection propagates the cultivation of a more balanced, representative, and comprehensive database.This approach ensures the capture of diverse perspectives, thereby enriching our comprehension of multifarious geospatial phenomena.The proactive integration of these regions provides a crucial conduit to bridge extant data voids while fostering knowledge sharing and capacity development.Moreover, it engenders a sense of communal responsibility and global collaboration directed toward understanding and mitigating shared challenges.Hence, we believe that the advancement of Citizen Science in the Global South marks a vital stride toward shaping a more equitable and insightful scientific terrain, profoundly contributing to the enhancement and inclusivity of our global data reservoir.

Pioneering a sustainable crowdsourcing ecosystem: From motivation to retention
In the contemporary digital landscape, the opportunity has emerged for citizens to significantly contribute to scientific advancements via crowdsourcing.For this potent instrument to realize its full potential and to make it sustainable, it is imperative to fortify several foundational elements.This entails constructing a unified community of dedicated citizen scientists and crafting incentives that optimally balance motivation with genuine engagement.Equally important is the commitment to inclusivity, ensuring that technological progress does not inadvertently result in disparities or omit specific groups.Central to this endeavor is comprehensive education, which guarantees that participants are not only adept at their tasks but also cognizant of the wider ramifications of their input.We illustrate these four perspectives in detail below.

Building a robust community of citizen scientists
Developing strong communities around these efforts can improve the long-term sustainability of data.Cultivating a robust community of engaged citizen scientists is imperative for the longevity of crowdsourcing initiatives [247].Projects that foster a sense of collective purpose and belonging can promote prolonged contributions from volunteers [252].For instance, eBird's passionate birder community and discussion forums create social incentives Downloaded from https://spj.science.orgon January 22, 2024 that sustain participation [253].Effective community-building entails establishing open communication channels, providing mentorship opportunities, and encouraging a participatory culture where volunteers feel valued in the scientific process [254].Decentralizing leadership and facilitating collaborations via workshops and events also strengthens communal bonds [255].Modular and personalized training resources further enhance sustainability by enabling volunteers to develop relevant skills while recognizing their contributions' significance [256].For example, CitSci.org'sadaptive courses on gathering field data provide tailored learning pathways based on needs and schedules, ensuring broad accessibility.

Incentivizing participation
Besides intrinsic motivations, crowdsourcing projects should explore supplementary incentives for attracting and retaining contributors [257].These could include reputational rewards like leaderboards, milestone badges, and opportunities for public recognition [258].More tangible benefits may include discounts on project merchandise, premium account features, or prize giveaways for active participants [259].However, caution is necessary to avoid over-gamifying participation or introducing disproportionate incentives that skew data [253].The SciStarter Project Finder illustrates how participants can be incentivized via different benefit categories (e.g., career development and social engagement), displayed transparently alongside each project [260].

Bridging the digital divide
Bridging digital divides is also critical for pioneering an inclusive crowdsourcing ecosystem, as technological and socioeconomic barriers can perpetuate representation gaps [261].For example, community-driven monitoring of local air quality using low-cost sensors revealed participation discrepancies along socioeconomic lines [262].Targeted outreach, infrastructure development, and offline participation options can help engage marginalized communities [263].LOCALE facilitates neighborhood-level data collection by providing local access to equipment and training [262].Text and telephone reporting systems also expand access, as exemplified by Mosquito Alert's multichannel disease surveillance [264].Ensuring wide accessibility promotes representative data inputs unconstrained by demographic factors.

Education and training initiatives
Lastly, comprehensive education and training initiatives raise awareness of crowdsourcing's significance while empowering quality contributions [265].Interactive workshops with field components enhance skills and data literacy for diverse audiences from students to policymakers [266].For example, Public Lab's community events build capacity for using low-cost tools for environmental monitoring through hands-on learning [267].Online resources like tutorial videos, customized teaching modules, and webinars enable self-paced learning.Knowledge exchange forums allow participants to learn from each other [268], as exemplified by the Cornell Bird Academy fostering an educative birder community [265].By imparting skills and communicating larger purposes, robust education sustains crowdsourcing participation while benefiting society.

From data to action: Translating crowdsourced geospatial data into real-world impact
There are several pathways toward translating the analytical results generated by crowdsourced geospatial data into real-world impact.First, crowdsourced data play a vital role in informing policy decisions and driving policy changes, particularly in the domains of environmental [262], health [203], and urban planning policies [269].This role has become increasingly important, especially following the outbreak of the COVID-19 pandemic [10].Additionally, crowdsourced data have immense potential in advancing scientific research and enabling scientists to gather data at scales and resolutions that were previously unattainable.The abundance of crowdsourced earth observation data (e.g., OSM and Mapillary) and human observation data (e.g., sentiment measures derived from social media) with extensive temporal and spatial coverage facilitates the availability of global or nationwide time-series studies [190,221,248].The analytical results, encompassing large spatial and temporal coverage, provide evidence that can be compared across countries and regions, offering policy implications for governments at various levels and international organizations.
Second, crowdsourced data contributed by individuals represent the intentions, ideas, and behavioral tendencies of the general public, often referred to as the "silent voice, " aiming to raise public awareness and encourage participation in citizen science.In this sense, crowdsourced data promote a broader understanding of people's awareness regarding public health crises (e.g., COVID-19 and vaccination), environmental changes (e.g., natural hazards), and post-pandemic economic recovery [190,270,271].Specifically, prior to the occurrence of these events and crises, crowdsourced data can enhance emergency preparedness and response through early warning systems, improved resource allocation, and better coordination on the ground.After these disasters and crises, crowdsourced data have the potential to facilitate real-time action through data mapping and monitoring, such as crisis mapping during disasters [272] or real-time air quality monitoring for public health advisories [273].Furthermore, industries and businesses also incorporate crowdsourced geospatial data into their strategies to gain business insights and support decisions related to market analysis [274], product development [275], and logistics planning [276].
Third, due to the aforementioned advantages of crowdsourced data, it has wide-ranging benefits and implications in empowering communities and individuals to advocate for their needs and protect their rights [277].It also facilitates global collaboration to address global challenges, such as climate change or pandemic tracking, which cannot be effectively tackled by traditional survey data or other types of small data.In the realm of urban planning and governance, crowdsourced data enable urban planners and government officials to make informed decisions regarding city development, transportation networks, and public infrastructure.It also enables the timely collection of feedback and suggestions from the general public through e-participation and e-governance channels [278].The aforementioned benefits associated with the use of crowdsourced data can be further extended through the development of tools and platforms that not only facilitate data collection but also make the data accessible and usable for decision-makers, communities, and individuals.This, in turn, helps bridge the gap between data and action, aligning with the goals of smart city initiatives and citizen science, which aim to create inclusive cities with an improved quality of life and increased socioeconomic performance through data-driven approaches, intelligent resource management, and participatory governance [279].

Conclusion
In this comprehensive review, we have dissected the multifaceted realm of crowdsourced geospatial data, illuminating its myriad applications, inherent challenges, and expansive potential in both human and Earth observations.Our exploration traverses the diverse domains of application, analyzes the nature and contributions of the data, and examines current data collection paradigms.In doing so, we map the present landscape of this burgeoning field and chart strategic directions essential for steering future research and applications across varied sectors.
The integration of time-sensitive data collection, AI, and IoT within geospatial crowdsourcing, coupled with an inclusive approach that encompasses underrepresented communities, fosters a detailed, real-time understanding of Earth's dynamics and human experiences, supported by a strong network of contributors.The emphasis on the collection of time-sensitive data allows for the attainment of enhanced, real-time socioenvironmental insights.Furthermore, the integration of AI and machine learning technologies holds the promise of revealing more intricate patterns and understandings within these accumulated data.The incorporation of IoT innovations in conjunction with crowdsourcing methodologies yields a more detailed and holistic understanding of environments and societal interactions.It is critically important to include a broad range of perspectives, particularly from typically underrepresented communities, in these initiatives.This inclusivity not only broadens the scope and depth of the data gathered but also guarantees a representation that is truly global in scale.To maintain the viability and effectiveness of geospatial crowdsourcing, it is vital to cultivate a strong network of citizen scientists, incentivize participation effectively, and address technological disparities.This endeavor requires comprehensive educational initiatives and training programs that adequately prepare participants, thereby equipping them with the necessary skills and knowledge.
The exceptional possibilities offered by crowdsourced geospatial data in reshaping information environments are simultaneously promising and complex.This calls for our focus not only on its extraordinary potential but also on addressing its inherent, multifaceted challenges, necessitating a collaborative and interdisciplinary strategy for effective solutions.With an eye on real-world applicability, we aspire for this review to serve as a foundational reference, guiding both scholarly and pragmatic pathways in upcoming explorations and applications within this evolving field.
Downloaded from https://spj.science.orgonJanuary 22, 2024 3 million complete checklists contributed by 899,200 birders since the launch of eBird in 2002.FrogWatch USA (https://www.akronzoo.org/frogwatch),established in 1998, is a citizen science program for volunteers at a network of chapters to collect and submit data on local frog and toad populations in the United States.Volunteers are trained to listen to and identify frogs and toad calls.Frog and toad observations are recorded with location, time, and descriptions of habitat characteristics.A total of 178,795 observations have been submitted to this project by 15,641 volunteers.