The China Active Faults Database (CAFD) and its web system

. Active faults serve as potential sources of destructive earthquakes. Studies and investigations of active faults are necessary for earthquake disaster prevention. This study presents a nation-scale database of active faults in China and its adjacent regions, in tandem with an associated web-based query system. This database is an updated version of the active faults data included in the Seismotectonic Map of China and its Adjacent Regions (1 : 4000000), which is one of the four essential maps of the mandatory Chinese standard GB 18306-2015 Seismic Ground Motion Parameter Zonation Maps of China. The data update and integration stem from regional-scale studies and surveys conducted over the past 2 decades (at reference scales from 1 : 250000 to 1 : 50000). The information amassed from these regional-scale studies and surveys encompasses geophysical probing, drill logging, measurement of offset landforms, sample dating, as well as geometric and kinematic parameters of exposed and blind faults, paleo-earthquake sequences, and recurrence intervals. These


Introduction
Earthquake is one of the most dangerous natural disasters in the world.A causative relationship exists between large or great earthquakes and active faults.Typically, earthquakes with magnitude (M) ≥ 7.0 often originate from Holocene or late Pleistocene active faults, or their epicentral zone overlaps such faults.Statistical analyses reveal that nearly all earthquakes with M ≥ 8.0 and the majority of those ranging from M 7.0 to 7.9 in China are linked to rupture segments of the primary boundary faults surrounding the Tibetan Plateau Block in western China and the Ordos Block in central and eastern China (Xu and Deng, 1996;Deng et al., 2003;Zhang et al., 2003;Xu et al., 2016a).Moreover, over 70 co-seismic surface rupture zones resulting from large earthquakes align spatially with known active faults (Xu and Deng, 1996;Zhang et al., 2003;Xu et al., 2017).Hence, the identification of active faults, delineation of their traces and locations, determination of their slip rates, and subsequent compilation of a comprehensive corresponding database are imperative for averting and mitigating the social and economic ramifications of earthquakes as well as safeguarding lives and property (Xu et al., 2002;Xu, 2006;Tian, 2006).This article introduces a publicly accessible, national-scale database detailing fault traces, the latest active ages, and motion modes of active faults in China.
Several countries have compiled comprehensive active faults databases over the past 2 decades (Haller et al., 2004;Basili et al., 2008Basili et al., , 2021;;Ganas et al., 2013;Langridge et al., 2016;Emre et al., 2018, Maldonado et al., 2021;Williams et al., 2022), some of which are available to the public.For instance, the National Institute of Geophysics and Volcanology of Italy published the Database of Individual Seismogenic Sources (DISS) in the 2000s and a database of active and capable faults, the ITaly HAzard from CApable faults (ITHACA, 2024) project.The latest iteration of DISS, version 3.3.0(Valensise and Pantatosti, 2001;Basili et al., 2008Basili et al., , 2021)), includes ∼ 200 faults.The U.S. Geological Survey established the first nationwide compilation of the US Quaternary Faults and Folds Database in the early 2000s, containing ∼ 2000 faults (Haller et al., 2004).
China is located in the convergence zone of the Indian, Eurasian, and Pacific plates where many seismogenically active faults have developed, and it has become one of the countries with the most severe earthquake disasters at present and also in history.An active faults database of China is essential for conducting in-depth analyses of regional crustal kinematic characteristics, intraplate earthquake features, and earthquake disaster mitigation strategies.In the 2000s, the China Earthquake Administration developed a 1 : 4 000 000scale active tectonic database (Qu, 2008), which was derived from a 1 : 4 000 000-scale active tectonic map that encompassed over 800 active faults and 48 active folds (Deng et al., 2002(Deng et al., , 2007b)).It summarized research on active faults carried out in China before 2002 AD.However, numerous active faults remained inadequately identified or studied during that period.In the following years, subsequent field surveys focused on investigating active neotectonic and seismic activities within the Circum-Pacific and Himalayan-Mediterranean seismic zones in China.To determine the accurate fault trace and age of the latest re-activation of an active fault that is capable of generating destructive earthquakes, a series of active fault surveys and mapping projects (Yang et al., 2018a(Yang et al., , b, 2020;;Huang et al., 2021a, b;Lei et al., 2008;Chai et al., 2011;Xu et al., 2015) was launched in 2007 in China.These projects consist of the following: (1) fundamental maps and data collection for national earthquake hazard prevention, such as the fifth-generation Seismic Ground Motion Parameter Zonation Maps of China (GB 18306-2015(GB 18306- , 2015)); (2) prospecting of active faults in urban regions and their earthquake risk assessments, such as "Urban active fault experimental prospecting" (2001)(2002)(2003)) (Pan et al., 2002;Wang et al., 2002) and the "Seismo-activefault prospecting technology system in China" (2004)(2005)(2006)(2007)(2008)) (Wang, 2004;Deng et al., 2007a); (3) seismo-active-fault survey and mapping, such as "The Himalayan Plan: active fault mapping at a scale of 1 : 50 000 in the north China tectonic region and along the North-South seismic zone" and "Earthquake risk assessment of active faults in the key earthquake surveillance and prevention areas".(4) Various other scientific projects.These four types of projects systematically analyze the published scientific literature, remote sensing data, field surveys, and dating samples from geological profiles, trenches, and boreholes to ascertain accurate geometric and activity kinematic parameters as well as mechanical properties of the studied active faults (Xu et al., 2015).A professional panel then reviewed the obtained evidence and parameters and rechecked the final results of these four types of projects.In every project, an overall prospecting-andsurveying-process database is built to record all project data from beginning to end.Those project databases include data associated with the geophysical prospecting, drilling, offset landform measuring, and age dating (e.g., cosmogenic nuclides, optical stimulated luminescence -OSL, electron spin resonance (ESR), or 14 C used for dating offset landforms and OSL or 14 C used for dating dislocated strata in trenches) as well as geometric and kinematic parameters of exposed and blind faults, paleo-earthquakes, their occurrence ages, and recurrence intervals.Data types within these region-wide databases comprise two-dimensional Geographic Information System (GIS) data, photographs, geological photos with interpreted faults and illustrations, geophysical prospecting data, the copyrighted electronic literature, and scientific reports.By the end of 2019, the cumulative data from these projects had amounted to 7 TB.
The China Active Faults Database (CAFD) represents a comprehensive geospatial database consolidating the most reliable outcomes from the aforementioned projects, predicated on two fundamental databases with accuracies of 1 : 50 000 for single active fault mappings and 1 : 250 000 for re-gional active fault distributions.The publicly accessible webbased query system offers the latest iteration of the China Earthquake and Fault Information System (CEFIS, 2023).Section 2 introduces the history and development of nationwide active fault maps and databases in China.Data acquisition, resources, processing, compilation, and quality are discussed in Sect.3. Additionally, Sect.3.9 presents several classical application cases to underscore the extensive utility of the database.The construction, function, performance, and usage of the web-based active fault query system are described in Sect. 4. System users can peruse and query fault information, obtain data from the Web Feature Service (WFS) and Web Map Service (WMS) servers in GIS software (such as ArcGIS and QGIS), and add active faults as layers in their web applications.

Nationwide active fault maps and databases
Various organizations and experts have compiled nationwide active tectonics and fault maps of China at different junctures.Each map summarized all of the research as much as possible before its publication date.Maps such as the "Spatial distribution map of active tectonics and strong earthquakes in China (1 : 3 000 000)" (NEIZMT, 1976), "Map of the major tectonic-system activity and strong earthquakes epicenter distribution in China (1 : 6 000 000)" (NEIZMT, 1978), "Seismotectonic map of China (1 : 4 000 000)" (GICEA, 1979), and "Lithospheric dynamics map of China and adjacent sea area (1 : 4 000 000)" (Ma, 1987) systematically synthesized the latest research achievements up to their respective periods.
In the last decade, the most influential nationwide active fault maps have been the "Active Tectonic Map of China (1 : 4 000 000)" (Deng et al., 2007b) and the "Seismotectonic Map in China and its Adjacent Regions (1 : 4 000 000)" (SM-CAR; Xu et al., 2016b).The Deng et al. (2007b) study has been widely disseminated among scientists, specialists, and the general public over the past 10 years but is not available online.Its earlier version was integrated into an early version of the Active Faults of Eurasia Database (Trifonov, 2004), an updated version of which was published in 2022 (Zelenin et al., 2022).Presently, the database is freely downloadable online (NEDC, 2023), and scientists have updated this map with new findings.For instance, Wu et al. (2018) compiled a "spatial distribution map of active faults in China and its adjacent sea areas (1 : 5 000 000) (2018)" by synthesizing publications in Chinese and English from the past decadal and 15-year research on active faults conducted by the Institute of Geomechanics at the Chinese Academy of Geological Sciences.
The SMCAR (Xu et al., 2016b) is a subproject of the fifth-generation Seismic Ground Motion Parameter Zonation Maps of China and is one of the four essential maps of the Chinese mandatory standard GB 18306-2015 (2015).This standard aims to develop seismic design criteria for seismic-resistant design in various regions.The SMCAR collected the latest reactivation ages of faults from the previously introduced nationwide maps and some public or private data.The SMCAR is now open to the public on the web system of the fifth-generation Seismic Ground Motion Parameter Zonation Maps (GB18306, 2023), and it has a geospatial database edition in addition to print and Joint Photographic Experts Group editions.This database integrates seismically active faults in China and its adjacent regions and is also known as the CAFD (2023).After geospatial correlation using remote sensing images in the World Geodetic System 1984 (WGS84) coordinate system, its spatial accuracy surpassed that of previous analogous maps and datasets.The fault data encompass fault attributes such as name, main characteristics, and faulting age.A simplified version is utilized in constructing a probabilistic seismic hazard model for mainland China (Rong et al., 2020).

Active faults database compilation workflow
The CAFD (2023) presented in this paper, which is based on the most reliable results of the projects introduced in Sect. 1, is an updated version of the CAFD (2015).The compilation workflow of the database is illustrated in Fig. 1.
The data used to update the nationwide CAFD (2015) are obtained from 120 regional project databases and research on the active fault, earthquake surface rupture investigations, and published literature in the past 2 decades.All these databases adhere to the same technological framework and are generated under well-established principles of active fault surveys (Sect.3.2-3.4),aligning with the technical requisites of the Chinese mandatory standard (GB/T 36072-2018, 2018).Each regional project database adheres to the identical data schema and standard as recommended by the China Earthquake Administration (GB/T 36072-2018(GB/T 36072- , 2018;;DB/T 53-2013DB/T 53- , 2013;;DB/T 65-2016DB/T 65- , 2016;;DB/T 81-2020DB/T 81- , 2020;;DB/T 82-2020DB/T 82- , 2020;;DB/T 83-2020DB/T 83- , 2020)).All parameter values of the fault data are computed following systematic criteria and definitions (Sect.3.7).Given the uniformity data definition, schema, and acquisition method, there exists no information disparity between these project databases.All data are processed using the same workflow.First, multiscale active fault data are extracted from different databases.Second, they are used to update the geometric shapes and attributes of the corresponding fault data in a nationwide database.Finally, the updated database ( 2023) is translated into English, adjusted for deployment, and released online (Sect. 3.4 and 3.6).

Exposed fault survey method
The exposed faults are those with surface expressions (such as linear fault scarp, offset gullies, and folding) or fault outcrop.Within the contemporary fault database, we only strengthen the locations, kinematics, and ages of these nearsurface faults.The fault geometry or dip angle, as suggested by seismic data, was not included.For the exposed faults with surface traces, remote sensing and digital elevation model (DEM) data are initially utilized to map fault traces and generate an initial distribution map of active faults.Then, combined with the field surveys, the locations of the faults in this initial map are verified, corrected, and recorded.Finally, a systematic workflow that combines geomorphological surveys, stratigraphic analyses of the geological cross sections, trench stratigraphic logs, sample dating from terraces and trenches, and paleo-earthquake identification is used to obtain the latest faulting ages and kinematic parameters of the mapped active faults (DB/T 53-2013(DB/T 53- , 2013;;Chen et al., 2016;Sun et al., 2017;Shi et al., 2019Shi et al., , 2022;;Guo et al., 2021;Huang et al., 2021a).Within this workflow, accurately locating dislocated strata, samples, and trenches within typical offset landforms is crucial.The dislocated strata, visualized in the trench, reveal the number of paleo-earthquake events and the kinematics of faults.The ages of the dislocated strata, measured by dating methods, including radiocarbon ( 14 C), cosmogenic nuclides ( 10 Be), and luminescence techniques, determined the age of the fault activity.These results were stored in those regional-scale survey databases.
The Fodongmiao-Hongyazi fault, mapped at a scale of 1 : 50 000 (Yang et al., 2018a(Yang et al., , b, 2020;;Huang et al., 2021a, b), serves as an example of the quantitative technical demands outlined in the Chinese mandatory standard (GB/T 36072-2018(GB/T 36072- , 2018)).First, remote sensing images with meter-level resolution (Quickbird, worldview, SPOT, etc.) and DEMs with horizontal and relative vertical resolutions of ≤ 37.5 m (SRTM 1 Arc-Second DEM, ARSTER-II DEM, etc.) were used to mark surface deformations or offset landforms (fault scarps, dislocated gullies, fault valleys, pull-apart basins, pressure ridges, terraces, alluvial or fluvial fans) and plan geological survey sites, lines, and areas.Following these marks and positions, the fault could be traced along its strike, and the coordinates of the exposed fault site are precisely recorded by using Global Navigation Satellite System and handheld GPS receivers.The average interval of the coordinate-recording sites is 500-2000 m, but if the surveyor can access a site, an interval of 500 m is required (Fig. 2; DB/T 2013).The density of the recorded sites controls the geometric accuracy of the fault data.The horizontal location error of every recorded site was less than 15 m.If the surface deformations or offset landforms disappear in some areas, the approximate fault location should be taken from the original interpretation of the high-resolution remote sensing images and DEM data.Subsequently, the next exposed fault segment is searched by traveling across the region in a "Z" route.When the next fault segment is identified again, the fault should be traced along its strike.After these steps, the geometry of the fault trace is finally confirmed on the map.Once the fault trace is ascertained, the fault is divided into segments based on the geological landforms, geometric structure (straight, curved, bent, etc.), displacement distribution, seismic rupture characteristics, or signs of fault activity, so that each section is relatively independent.Along the key segments, typical offset landforms should be selected for further geomorphic and topographic measurements (Fig. 3a and  c).In every independent segment, dislocated strata, samples, and trenches are accurately located, and the number of paleoearthquake events is visualized in the trenches (Fig. 3b and  d).Dating of dislocated strata only provides the maximal age of a rupturing event.To get a more reliable age of the event, we have to date both ruptured and non-ruptured units.Common dating methods include radiocarbon ( 14 C), cosmogenic nuclides ( 10 Be), and luminescence techniques.The fault is divided into segments based on the mapped geometry.The ages obtained from a single geometry segment presented the age of this segment.They are used to identify whether a fault is active, in order to calculate its slip rate during a certain period.

Blind fault survey method
The buried faults are those that lack near-surface exposure, potentially concealed by overlying sediments or rock formations.Firstly, we collected petroleum exploration profiles, historical earthquakes and the literature.The location of the blind faults was inferred from the collected petroleum exploration profiles.Secondly, the historical earthquakes and published references to tectonic settings helped us to figure out the faults associated with earthquakes.Thirdly, a comprehensive multilevel exploration method with geophysics and drilling sites near the collected petroleum exploration profiles was applied to determine its exact near-surface location and the position of the uppermost breakpoint of the major blind fault.Then samples obtained by drilling and dating techniques of the displaced and un-displaced strata and their dated chronological ages were used to identify their Late Quaternary activity.This method encompasses multilevel seismic exploration, joint drilling to establish faults across geological sections, trenching, and other technologies aimed at detecting blind active faults from deep to shallow depths.
In this study, the blind Yinchuan active fault serves as an illustrative example (Fig. 4; Chai et al., 2006Chai et al., , 2011;;Liu et al., 2008) to elucidate the quantitative technical requirements outlined in the Chinese mandatory standard (GB/T 36072-2018, 2018).Firstly, the seismic petroleum exploration profiles are used to reveal the approximate location of the target fault at a depth of hundreds of meters and the bottom of the Quaternary, marked by the shallowest continuous seismic reflection layer.Based on this information, a set of shallow seismic exploration profiles (in an interval of ≤ 2.5 km) is set up on the approximate ground to detect the depth of the uppermost point of the target fault.Secondly, two boreholes are drilled on both sides of the detected target fault to preliminarily verify the existence of the target faults (Fig. 5).During this exploration phase, the borehole number is gradually increased on both sides of the target fault to locate the depth of the uppermost points of the faults (Fig. 6; Chai et al., 2006;Lei et al., 2008;Wang et al., 2016).It requires at least three boreholes on each fault wall, with an interval of 5-45 m.The distance between the two boreholes on both sides of the target fault should be less than 10 m.Also, at least one borehole is required to penetrate the bottom of the Upper Pleistocene on each side, and the final depth of other boreholes is needed to be 10 m beneath the uppermost points determined by the shallow seismic exploration (GB/T 36072-2018, 2018).The exact location and faulting age of the target blind fault could be identified by strategic analysis and sample dating of the borehole cores.If the depth of the uppermost points determined by the joint drillings is less than 10 m deep from the ground, more information on the blind fault geometry and paleo-earthquakes could be revealed by trenching.The mapped blind fault trace comprises vertically projected uppermost points on the ground, obtained through the comprehensive multilevel exploration approach. https://doi.org/10.5194/essd-16-3391-2024 Earth Syst.Sci.Data, 16, 3391-3417, 2024

Data sources and fundamental works
The CAFD (2023) was updated by integrating new data from the projects of active fault surveying in urban regions, seismically active fault mapping at a scale of 1 : 50 000 in northern China and the North-South Seismic Zone, seismic risk assessment of active faults in key earthquake surveillance and prevention areas, and other scientific research endeavors (Fig. 7).These projects are introduced in detail below.The projects on active fault surveying and seismic risk assessment primarily aim to identify blind or exposed seismically active faults and evaluate earthquake risk in largeand medium-sized cities as well as in the key earthquake surveillance and prevention areas.Conversely, active fault mapping projects focus on pinpointing detailed locations of exposed seismically active faults to facilitate land-use planning and utilization (Xu et al., 2015;Zhu et al., 2005;Chai et al., 2011;Liang and Wu, 2013;Shen and Bo, 2013;Hou et al., 2012;Chen, 2013;Yang, 2010).Fundamental works within these projects encompass five main aspects: (1) identification of fault activity, (2) detection of deep fault structures within the crust, (3) assessment of earthquake risk associated with major identified faults to discern seismically active faults, (4) precise delineation of geometry for major seismically active faults, and (5) evaluation of earthquake hazards posed by seismically active faults.Achievements of these projects include maps illustrating the regional dis-tribution of the active faults at a scale of 1 : 250 000 and detailed fault traces of a single active fault at a scale of 1 : 50 000, exploration reports, project databases, and information systems (Xu et al., 2015).All the data obtained from the fundamental works and the project achievements are carefully reviewed by three to eight experts from a professional panel.Therefore, the data results were credible.These projects were carried out in ∼ 100 cities, including 26 provincial capitals and municipalities, until March 2020.Twenty urban active fault survey project databases (Table A1), which were compiled from 2002 to 2009 in Beijing, Tianjin, Shanghai, Nanjing, Ningbo, Zhengzhou, Qingdao, Hohhot, Taiyuan, Xi'an, Yinchuan, Lanzhou, Xining, Lhasa, Kunming, Urumqi, Haikou, Guangzhou, Changchun, and Shenyang (Fig. 7), are the earliest fault data published and released to the public (Xu et al., 2015) and are also used to update the nationwide active faults database.
Active fault survey and mapping projects at scales of 1 : 50 000 and 1 : 250 000 received funding from the China Earthquake Administration, with the objective of acquiring the precise location, spatial distribution, geometric and kinematic parameters, activity ages, slip rates, paleo-earthquake events and their recurrence intervals, and elapsed time since the last surface-rupturing event on the faults.These projects followed the procedure introduced in Sect.3.2-3.4and meet the quantitative requirements of mandatory and recom-  A2).
Scientific research projects focused on addressing specific scientific inquiries concerning active faults and earthquakes.These projects delved into seismotectonics and seismogenesis based on geometric and kinematic features of faults within specific sites or regions.The results provide parameters such as reliable slip rates, paleo-earthquake sequences, potential magnitudes of future earthquakes, coseismic slips, and their distribution along the strike of seismogenically active faults.To update the nationwide active faults database in this study, data from the 2021 M 7.4 Madoi Earthquake investigation (Chen et al., 2022), 2008 M 8.0 Wenchuan Earthquake investigation (Z.Xu et al., 2008;X. Xu et al., 2008X. Xu et al., , 2009;;Xu, 2009;Chen et al., 2009), 2014 M 6.5 Ludian Earthquake investigation (X.Xu et al., 2014;C. Xu et al., 2014), Xia-Dian fault survey (Xu et al., 2000;He et al., 2013), and research on the Tanlu fault (Shu et al., 2016(Shu et al., , 2020;;Li et al., 2019) were utilized (Fig. 7).

Data processing
The CAFD (2015), along with project databases of active fault surveys and mappings in various regions, constituted a complex dataset characterized by multiple scales and varying accuracies, with a considerable total data size.As detailed in Sect. 1, the databases of active fault surveys and mappings comprise 100 sub-databases, totaling ∼ 7 TB of data.All data undergo rigorous supervision and review by professional panels.For the primary objective of presenting the latest integrated achievements in this study, the individual reliability of fault data is not delineated.Instead, the individual data source of fault data is encompassed.To integrate these extensive datasets into a unified database with consistent data criteria and schemas, we first integrated those project databases constructed on a regional scale and then updated the national-scale CAFD (Fig. 1).Significant updates to the 1 : 4 000 000 nationwide database include the refinement of activity ages and fault locations for the late Pleistocene and Holocene faults. https://doi.org/10.5194/essd-16-3391-2024 Earth Syst.Sci.Data, 16, 3391-3417, 2024 The project databases of active fault surveys and seismically active fault mappings are constructed by using the same criteria.They have the same data schema and use unified, well-established acquisition methods.Consequently, fault data from these two types of databases can be processed using consistent procedures, with minimal effort required for data cleaning and mining.The first step involves extracting multiscale fault data from the 120 project databases.The second step is to integrate them.These projects are strategically planned to avoid overlap of fault data with the same scale.If the same region contains more than one fault trace, only the largest-scale data are used for integration.As the scales of the new well-mapped fault traces are equal to or even larger than 1 : 250 000, they are too complex to be integrated into 1 : 4 000 000-scale data.Therefore, the third step is to simplify the fault traces.In large-scale fault data, a fault is generally segmented for detailed investigation; hence, contiguous segments may have different activity ages.One of the most important applications of the database is to use hardcopy or electronic image maps for earthquake emergency response (Wu et al., 2021).The reference scale of the hardcopy maps is about 1 : 4 000 000-1 : 1 000 000.If the con-tiguous segments within 2 cm have different activity ages, they will be merged for map generalization.When integrated into the national-scale fault data, two or even more small contiguous segments may be merged into one.Under this condition, the activity age of the merged fault trace is the same as the latest one of the merged segments.For example, the blind Yinchuan fault (1 : 250 000) is divided into the Holocene northern segment (Fig. 4a, red dotted line in the blue rectangle) and the late Pleistocene southern segment (Fig. 4a, orange dotted line in the blue rectangle).The total length of those two segments is 80 km, which is only 2 cm on a 1 : 4 000 000 scale map.Therefore, the two segments are merged into a Holocene one.
Scientific research predominantly focuses on individual segments or a limited number of surveying sites for a given fault.These data serve to supplement the CAFD (2023) to complete and correct the national-scale fault data using a similar methodology.While the CAFD (2015) shares the same data definition and acquisition methods as previously described, its data schema slightly differs from that of project databases concerning field names and domain values.Considering that the CAFD (2015) has fewer fault traces than the project databases, we adjust the CAFD schema to fit the project databases.Subsequently, the processed data introduced in the previous two paragraphs are smoothly integrated into the CAFD (2023).The CAFD ( 2023) is adjusted for deployment in the Web GIS system before being published in the last step.

Data descriptor
The active faults database undergoes translation into English prior to deployment in the system and global release, facilitating access for scientists and engineers worldwide.Its fields are the fault zone name, fault name, fault segment name, kinematic features, and activity age (Table A3).
The fault data are graded according to size and characteristics using the "fault zone name", "fault name", and "fault segment name" fields.A fault zone is a cluster of parallel faults such as the Tanlu and Longmenshan fault zones.In general, faults in the same fault zone exhibit congruence in geometry and kinematics, together with accumulated crustal strains, or are possibly connected in deep.A single-fault zone consists of a single fault or several faults.A single fault is further subdivided into multiple segments based on geometry.Each segment has specific and different geometric characteristics and is a basic studied unit of a fault.Take the Fodongmiao-Hongyazi fault (FHF) as an example (Huang et al., 2021b) (Fig. 8).It is one fault of the northern margin fault zone of the Qilianshan Mountains.Its western and middle segments exhibit divergent geometric patterns: the western seg-ment trace displays linear characteristics, while the middle segment trace resembles jagged teeth.Moreover, the eastern segment, distinct from the middle segment, is delineated by an anticline and follows a linear trajectory.Not all faults are affiliated with a fault zone.
Only some important active fault line data belonging to fault zones in block boundary zones are assigned a fault zone name.Additionally, highly scrutinized faults are segmented.and the corresponding fault line data have fault segment names.Given the intricacies and vast number of faults in China, the process of rating and naming must continue.
The field named "feature" stores information regarding the motion type and visibility of the fault on the ground.Based on the relative movement of the two walls, the faults were classified as normal, reverse, strike-slip, or oblique faults.Oblique faults consist of left-and right-oblique slip faults, with vertical components that might be either normal or reverse.Active faults are also divided into exposed and buried faults.
The active faults database is aimed at earthquake disaster prevention and focuses on the latest activity during the Quaternary.Therefore, faults are classified as Holocene, late Pleistocene, middle-early Pleistocene, and pre-Quaternary faults, denoted by the field "Age" (GB/T 36072-2018(GB/T 36072- , 2018)).Holocene faults are those with active evidence from the Holocene or the past 12 000 years.For late Pleistocene faults, active evidence exists in the late Pleistocene but not in the Holocene.Middle-early Pleistocene faults are those with the latest active evidence in the middle or early Pleistocene.For Figure 6.Joint drilling geological cross section at Xinqushao in the Yinchuan Basin (adapted from Lei et al., 2008;Chai et al., 2011).
pre-Quaternary faults, active evidence is not available in the Quaternary.This means that no evidence showed that the fault displaced the Quaternary landforms or sediments.There was also no Quaternary fault age information such as the ESR dating fault gouge.Major active evidence is based on the latest dislocated stratum.This method is introduced in Sect.3.3-3.4.

Quality discussion
The CAFD (2023) collects the maximum amount of reliable data in the projects introduced in Sect.3.5 launched by the China Earthquake Administration, related to earthquakes with the primary objective of effectively preventing earthquake disasters by determining earthquake sources and carrying out active tectonic zonation.The spatial correlation between faults and earthquakes with magnitudes greater than 6.5 is high.The amount of fault data is large.The database contains ∼ 7000 fault traces, of which ∼ 1600 faults are named.The fault names were collected from published or unpublished papers, the geological literature, or existing fault databases, utilizing two primary naming methods.One is to name them after mountains and rivers.The other is to name them after the place name (county, village, etc.).
Active fault surveys in China are difficult because the country is located in the intersection region of the Circum-Pacific and Eurasian seismic zones, resulting in complex continental tectonics, widespread distribution of active faults, intense neotectonics and seismic activity, and inaccessible terrain.The extent of seismogenic fault research varies from region to region.For some faults with a low research extent, their geometric and kinematic parameters remain unknown or imprecise.In the periphery and interior of the Tibetan Plateau, which was formed during the collision of the Indian and Eurasian plates, mega-strike-slip fault systems, such as the Altyn Tagh, eastern Kunlun and Xianshuihe faults, thrust fault systems (e.g., the Himalayan frontal, Hexi Corridor, and Longmenshan thrusts), and N-S-striking normal faults in the western part of the plateau exist.The thrusts and strike-slip fault have also developed at the northern and southern piedmonts and in their interior.In some regions in the Tien Shan  (see Fig. 7) and Tibetan Plateau, due to the high altitude and snowcaps, it is difficult to carry out research work and obtain accurate fault data.Numerous oblique normal faults around the Ordos Block and strike-slip faults, such as the Tanlu fault in eastern China, exist too.Those faults are located in regions with dense urban construction and populations or thick Quaternary deposits.Therefore, it is difficult to find those fault traces on the ground and locate the blind faults underground.Besides, the spatial relationship and geometrical link of some faults, such as some segments of the Tanlu fault in eastern China and some E-W-trending faults on the Tibetan Plateau, also remain unclear.
Thus, the vector lines of such faults directly cross each other without showing a geometrical link.In addition, research on marine and maritime island faults has been limited by surveying technology.
Given that the China Active Faults Database is primarily based on 1 : 4 000 000 data but updated with 1 : 250 000-1 : 50 000 data, the overall coordinate accuracy is comparable to that of a 1 : 1 000 000 map with 1 mm representing https://doi.org/10.5194/essd-16-3391-2024 Earth Syst.Sci.Data, 16, 3391-3417, 2024  1 km in the real world, depicting the width of the fault line symbol.The data precision is partially better than that of the 1 : 1 000 000 map because the reference is on a larger scale.
In conclusion, the CAFD (2023) reflects the latest research on active faults and fault data integration.More investigations of active tectonics and fault systems should be carried out in China, and the nationwide database should be updated in the future.

Application
The CAFD (2023) and its previous versions have been widely used by the Chinese government, research institutions, and associated companies.The National Geomatics Center of China and the China Petroleum & Chemical Corporation take those data as a reference material for analyzing the seismotectonic environment within their information management systems.The national-scale China Active Faults Database is the basic reference for compiling seismotectonic maps on both regional and national scales.Examples include the seismotectonic map of the Ordos Block and its boundary zones (1 : 500 000; National Research and Development Program of China; no.2017YFC150100), the digital seismotectonic map of the northeastern seismic zone in China (1 : 1 000 000; a Spark Plan project funded by the China Earthquake Administration; no.XH18015), the seismotectonic map of Shanxi Province and its adjacent regions (1 : 500 000; a public service map produced by the Shanxi Earthquake Agency), and the seismotectonic map of China (1 : 1 000 000; the first comprehensive natural hazard risk investigation in China).Furthermore, these data play a crucial role in earthquake emergency response services, monitoring services, forecasting services, and earthquake disaster prevention, all under the supervision of the China Earthquake Administration.Since 2018, the database has been provided to the earthquake response departments of the China Earthquake Networks Center for emergency actions.In the same year, it was also made available to the working research group of the post-earthquake prediction technology system, a key project in earthquake monitoring and forecast-ing from the China Earthquake Administration (project no.18440680117).In 2019, this database was transferred to the China Earthquake Disaster Prevention Center to establish the Data Center for Seismic Active Fault Surveys.The Institute of Geology, China Earthquake Administration (IG, CEA), has also produced seismotectonic maps during earthquake emergencies based on the database (Wu et al., 2021).The WFS hosted in our system has been utilized by a commercial app (GeoQuater).

System introduction
4.1 System performance and architecture CEFIS (2021), which has been available online and continuously updated since 2019, offers web services for querying earthquakes and active faults in China and adjacent regions.The CAFD (2015) has been released into the system.In 2021, the system was updated again by simplifying the interface, adding the English fields, earthquake base map, layer addition, and system sharing function.Additionally, a simplified version of a regional active fault survey map (Wu et al., https://doi.org/10.5194/essd-16-3391-2024 Earth Syst.Sci.Data, 16, 3391-3417, 2024 Earth Syst.Sci.Data, 16, 3391-3417, 2024 https://doi.org/10.5194/essd-16-3391-20242022), introduced in Sect.3.3, is also open to the public as a map service within the additional layer list.This section provides an overview of the architecture of the current version (CEFIS, 2023).
The system is built on the ArcGIS Enterprise platform 10.6 using ArcGIS Web AppBuilder in B/S mode and is divided into four layers (Fig. 9), i.e., data, service, portal, and application layers.
The data layer deploys the PostgreSQL database to store active fault and earthquake data.PostgreSQL, a free opensource object-relational database system, can be connected to an ArcGIS Server deployed in the service layer.The ArcGIS Server can publish data in the form of map and feature services, which are easily accessible through web applications.The ArcGIS Portal and ArcGIS Web Adaptor are deployed in the portal layer to provide WMS and WFS and manage user access.The ArcGIS Portal provides intuitive what-you-seeis-what-you-get applications such as AppBuilder with readyto-use widgets, which enable the construction of map and three-dimensional scene applications on the web.The system discussed in this study is also developed using ArcGIS Web AppBuilder.Supported by these technologies, the CAFD can be accessed across various platforms, including desktop software, smartphones, and online sites.

Earthquake sequence data
The system uses earthquakes (M>5.0) as a background to illustrate seismic activity on and around faults.The earthquake catalog sourced from the National Earthquake Data Center (NEDC) is converted into geographic vector data and deployed in the system (NEDC, 2024a).This dataset encompasses historical and instrumental earthquake records preceding June 2021.The system contains three earthquake layers corresponding to the three datasets obtained from the NEDC.The NEDC provides three datasets based on specific time periods: a historical earthquake catalog (prior to ∼ 31 December 1969; Table A4), the earthquake catalog from the China Earthquake Networks (CEN; 1 January 1970-31 December 2008; Table A5), and the official earthquake catalog from the CEN (1 January 2009-31 July 2023; Table A6).The historic earthquake catalog compiled by Gu (1983) includes destructive earthquakes (M ≥ 5.0) that occurred from 1831 BC to 1969 AD.The CEN earthquake catalog comprises data from 88 national seismograph network stations (digital), regional someone-on-duty network stations (digital), and simulated network stations.The official earthquake catalog is from the CEN, which is obtained from nationwide earthquake monitoring station networks comprising national and regional (31) station networks after 1 January 2009.

System interface and function
The system is a Web Map application that displays and queries the Active Faults Database of China and its adjacent regions.It is publicly available worldwide.Its interface comprises seven components (Fig. 10): (a) web map, (b) eagleeye map, (c) attribute table, (d) browsing tools, (e) address geolocation tool, and (f and g) system tools.
The language of the system interface is determined by the browser's default settings.The data field values are presented in both English and simplified Chinese within the attribute table and query dialog.Thus, the system can be used by both English and Chinese speakers.Additionally, people proficient in languages such as French, German, and Russian can also use the system.

Data query and export
The system offers four methods for querying fault information.(1) The menu within the attribute table window (Fig. 10a) provides a "filter" tool to query faults with certain conditions (Fig. 11a).( 2) The second tool is a spatial selection tool (Fig. 10f) for fault and earthquake data (Fig. 11c).
(3) The third tool allows fault queries by feature, activity age, or name under specific spatial conditions (Figs. 11b and 10f).( 4) The address geolocation tool (Fig. 10e) can be used to zoom the map into a specific region and export the faults in that region (Fig. 11d).The system supports exporting query results through various methods (Fig. 11).

How to use the data service
The system publishes the Open Geospatial Consortium (OGC) WFS and WMS of the China Active Faults Database.The OGC WFS and WMS are dynamic services that offer real-time maps on the web, adhering to OGC specifications.These services facilitate open and authentic access to web maps across different platforms and clients.The available operations of the WMS in the system in this study are GetCapabilities, GetLegendGraphic, GetSchemaExtension, GetFeatureInfo, and GetMapGetStyles.Compared with WMS, WFS provides greater data access, benefiting from its ability to insert, update, delete, retrieve, and discover geographic elements via HTTP in a distributed environment.The available WFS operations encompass GetCapabilities, DescribeFeatureType, GetPropertyValue, GetFeature, Get-GmlObject, ListStoredQueries, and DescribeStoredQueries.In essence, data can be browsed, queried, analyzed, and downloaded from the system but not revised.Furthermore, fault layers can be integrated into GIS software for analysis through WMS and WFS, such as ArcGIS Pro and QGIS (Fig. 12).https://doi.org/10.5194/essd-16-3391-2024 Earth Syst.Sci.Data, 16, 3391-3417, 2024 X. Wu et al.: The CAFD and its web system

Conclusions
The CAFD (2023) integrates both the national-scale fault database and the latest decadal regional-scale fault survey data.The database and its previous versions have been widely applied in government departments, research institutes, and commercial companies.China is situated at the intersection between the Circum-Pacific and Eurasian seismic zones with numerous complex continental tectonics, active faults, and earthquake activity.However, challenges persist in surveying or locating active faults in certain regions due to inaccessible terrain and anthropogenic activities.Active faults should be considered for earthquake disaster mitigation.Therefore, the database will be gradually updated in the future based on future references, and a later version of the system may be released if it is finished.The first version of the web system (CEFIS, 2021) performed effectively for nearly 2 years, and its second version (CEFIS, 2023) was released in 2021 and has also been operating well.This study delineates the architecture, interface, function, and usage of the system, serving as a platform for querying and analyzing the integrated active faults database in China.This database encapsulates crucial information pertaining to the location, latest activity age, and kinematic char- acteristics of the faults.The system can be used by both English and Chinese speakers.Additionally, people proficient in languages such as French, German, and Russian can also use the system.
Data and services are openly shared worldwide via the web system.The data can be downloaded from a browser or GIS software.A third-party application can link to and use the WMS and WFS (CAFD WMS, 2024; CAFD WFS, 2024).Users can get help from the ArcGIS online document.Section 4.4 lists the available operations of the services.
Earth Syst.Sci.Data, 16, 3391-3417, 2024 https://doi.org/10.5194/essd-16-3391-2024Acknowledgements.We acknowledge all the colleagues who have contributed to the active faults databases in China in the past decades.Many scientists and engineers have participated in active fault surveys and mappings and made efforts to build regional-scale project databases.Their work provided basic high-quality materials.The earthquake catalogs are downloadable from the Data Sharing Infrastructure of the National Earthquake Data Center (http: //data.earthquake.cn,last access: 29 June 2024).We acknowledge the data support from the China Earthquake Networks Center and the National Earthquake Data Center.Review statement.This paper was edited by Giulio G. R. Iovine and reviewed by Kirsten Elger, Alexander Strom, and one anonymous referee. https://doi.org/10.5194/essd-16-3391-2024 Earth Syst.Sci.Data, 16, 3391-3417, 2024

Figure 1 .
Figure 1.Workflow to construct the China Active Faults Database.

Figure 2 .
Figure 2. Survey sites for mapping of the Fodongmiao-Hongyazi fault.The average interval of coordinate-recording sites ranges from 500 to 2000 m.The fault belongs to the Qilianshan thrust fault zone at the northeastern margin of the Tibetan Plateau.

Figure 3 .
Figure 3. Key fault segment surveying example from the Fodongmiao-Hongyazi fault (red line).(a) Distribution of the key fault segments in which geomorphic measuring, trenching, sample collecting, and paleo-earthquake trenching sites are marked by dark blue rectangles.(b) The locations of trench TC4 and topographic profile P-P are represented by the black rectangle and black line, respectively.(c) Topographic profile (P-P ) showing fault offsets (adapted from Huang et al., 2021b).(d) Interpretation of the eastern wall of trench TC4 in detail (adapted from Huang et al., 2021b).

Figure 4 .
Figure 4. Map of the blind Yinchuan fault in the Yinchuan Basin located in the northern portion of the North-South Seismic Zone in China.

Figure 5 .
Figure 5. Approximate locations of the detected target fault and boreholes along a seismic exploration profile (adapted fromChai et al., 2011;Xu et al., 2015).

Figure 7 .
Figure 7. Sketch map of updated fault data in China.

Figure 8 .
Figure 8. Map of (a) the northern fault zone of the Qilianshan Mountains and (b) Fodongmiao-Hongyazi fault.

Figure 10 .
Figure 10.System interface.(a) Web map displaying only fault traces in the full-extent view; when zoomed into the regional scale, earthquake epicenters will appear on the map.(b) Accordion overview map.(c) Accordion attribute table.(d)Navigation toolbar; the tools from top to bottom are zoom in, zoom out, default extent, zoom into the current position, full extent, previous view, and next view.(e) Address geolocation tool.System tools: (f) measurement, selection, inquiry, and layer addition (from left to right) and (g) legend, layer controller, base maps, and sharing (from left to right).They were screenshotted from our system (CEFIS, 2023).

Figure 12 .
Figure 12.(a) How to add WMS Server in ArcGIS Pro.(b) How to add WFS Server in QGIS and export data.They were screenshotted from the ArcGIS Pro or QGIS software.

Figure 13 .
Figure 13.The translated English websites for FormalEq20090101T20210630. This was screenshotted from the website of the NEDC.

Figure 14 .
Figure 14.The translated English websites for CSNEq19700101T20081231.This was screenshotted from the website of the NEDC.

Figure 15 .
Figure 15.The translated English websites for HistoryEqT19691231.This was screenshotted from the website of the NEDC.

Financial support .
This research has been supported by the National Natural Science Foundation of China (grant no.41941016) and the Institute of Geology, China Earthquake Administration (grant no.Research and Development Program of IG, CEA).

Table A3 .
Attributes of fault data.FaultSegmentName_Ch Fault segment name (in simplified Chinese) FaultSegmentName_En Fault segment name (in English) FormerFaultName Former name of fault (in simplified Chinese) Feature_Ch Kinetic property and detectability of the fault segment (in simplified Chinese) Feature_En Kinetic property and detectability of the fault segment (in English) Age Active age of the fault segment (in English abbreviations)

Table A6 .
Attributes of FormalEq20090101T20230730. Conceptualization: XX, XW; data curation: GY, XW, GC, JR, KL, CX; formal analysis: KD, XX, XW, GY; funding acquisition: XX, GY, XW; investigation: XX, XW, GY, GC, XY, HY, XH; methodology: GY, XW, KD, GC, JR, KL, CX; project administration: XX, GY, XW; software: KD, XW; supervision: XX, GY; validation: XX, XW; writing: XW, XX.All the authors have read and agreed to the published version of the manuscript.Competing interests.The contact author has declared that none of the authors has any competing interestsDisclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper.While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.