Dataset of human interventions as anthropogenic perturbations on the Caribbean coast of Colombia

Human interventions on coastal areas are always causing environmental impact; however, most of the times inventories of those interventions are possibly not well structured, and surely without a specific standard. The raw data presented shows an exhaustive and systematic revision of satellite images on 1700 km of the Caribbean coast of Colombia, where 2743 human interventions were identified. These interventions are classified in 38 categories in order to assess their environmental impact at a regional scale. The filtered data shows the environmental impact obtained for each category and the values allotted to each of the four parameters used for this evaluation. Moreover, the data is filtered for each of the five environmental coastal units in which the Caribbean coast of Colombia is divided by national regulations. Finally, the filtered and processed data shows the analysis done to obtain the graphical results of a previously paper (An evaluation of human interventions in the anthropogenically disturbed Caribbean Coast of Colombia [1]). Therefore, this dataset comprises three spreadsheets (xlsx) and two geographical files (kmz), which are ready to be used for any researcher, decision maker, land planner or practitioner interested in making further analysis on environmental impact assessment in coastal areas. Additionally, the dataset is carefully organised for educational exercises in such a manner that professors or lecturers can repeat the same steps in this study area or in their own, from the inventory to the final results.


a b s t r a c t
Human interventions on coastal areas are always causing environmental impact; however, most of the times inventories of those interventions are possibly not well structured, and surely without a specific standard. The raw data presented shows an exhaustive and systematic revision of satellite images on 1700 km of the Caribbean coast of Colombia, where 2743 human interventions were identified. These interventions are classified in 38 categories in order to assess their environmental impact at a regional scale. The filtered data shows the environmental impact obtained for each category and the values allotted to each of the four parameters used for this evaluation. Moreover, the data is filtered for each of the five environmental coastal units in which the Caribbean coast of Colombia is divided by national regulations. Finally, the filtered and processed data shows the analysis done to obtain the graphical results of a previously paper (An evaluation of human interventions in the anthropogenically disturbed Caribbean Coast of Colombia [1] ). Therefore, this dataset comprises three spreadsheets (xlsx) and two geographical files (kmz), which are ready to be used for any researcher, decision maker, land planner or practitioner interested in making further analysis on environmental impact assessment in coastal areas. Additionally, the dataset is carefully organised for educational exercises in such a manner that professors or lecturers can repeat the same steps in this study area or in their own, from the inventory to the final results.
© 2020 The Author(s areas using an open source tool such as Google Earth. It also shows how to process, calculate and graphically represent the environmental impact in a simple way, which could be very useful for professors in environmental and marine sciences. • The dataset is formed by three spreadsheets, which allow future researchers and practitioners to repeat the same process in three levels of complexity: raw data for inventory of human interventions, filter and process data for calculations of environmental impact and analysed data for statistical and graphical representations. • The dataset can be used as a baseline for long-term monitoring of the human interventions on the Caribbean coast of Colombia and their environmental impact on coastal and marine ecosystems.

Data description
The dataset contains five files: three spreadsheets in MS Excel format (xlsx) and two geographical files in Google Earth format (kmz), which are presented as supplementary material. The first spreadsheet ( DiB_Intervencoast_tables_Raw ) includes the raw data of all 2743 human interventions found on the Caribbean coast of Colombia, and is used to register an inventory of 1700 km of coastline. This raw data file has 40 datasheets in which the first shows the seven categories and 38 types of human interventions used, with their codes, descriptions and quantity of data ( Table 1 ). The second datasheet consolidates all the human interventions identified in the five Environmental Coastal Units (ECU) of the study area, which adds up to 3957 records. The rest of the 38 datasheets show human interventions in each typology, describing the ECU, position mark, geocode in the kmz files, date of the satellite image and the satellite source; the datasheets of each category have the same colour as the one used in the first descriptive datasheet to make their usage easy ( Table 1 ). The differences between the total number of records (3957) and the number of interventions (2743) follow the distinctive geographical representations for the identified interventions. Some interventions were marked as polygons of four vertices (e.g. aquaculture farms, towns, condominiums), others as lines of two vertices (e.g. roads, groins/jetties) and the rest as single points (e.g. hotels, military bases, ports). Therefore, the polygons have four records, corresponding to the four cardinal extreme points (N, E, S, W), and the lines have two records, one for each extreme point.
The second spreadsheet ( Intervencoast_tables_filtered.xlsx ) has five datasheets with consolidated, filtered and processed data. The first datasheet includes the frequency of 38 human interventions in each typology per each ECU ( Table 2 ). The rows show the name and code of each type of intervention, the number of interventions in the five ECU and the total interventions in each typology. Additionally, this datasheet shows the simplified environmental impact assessment done to each intervention typology ( Table 3 ). This section has twelve rows that could be classified in three groups: the first three rows show the type of intervention, their frequency of occurrence and their percentage over the total interventions count; the following six rows are the parameters (EXT = extension; INT = intensity; REV = reversibility; PER = persistence) used to calculate the Unitary Environmental Impact (UEI; fifth row) and the proportion in the overall UEI; the final three rows show the Total Environmental Impact (TEI) for each intervention type, which is a function of the UEI and the frequency of occurrence, the proportion in the overall TEI of the study area and the accumulated frequency of TEI values.
The second datasheet of Intervencoast_tables_filtered.xlsx has the filtered data used to graph the main frequency patterns of human interventions on the Caribbean coast of Colombia.   Table 1 Categories, types and description of human interventions in coastal areas and quantity of data for the Caribbean coast of Colombian.
( continued on next page ) shows the UEI value for each typology, adding a colour for each quartile (Q1 = red; Q2 = Orange; Q3 = Yellow; Q4 = Blue). Fig. 2 shows the comparison between the UEI values versus the TEI values obtained by each typology; because UEI and TEI units have different scales of magnitude, the left side of the Y axis is for UEI and the right side is for TEI. Fig. 3 shows the same   comparison, but using normalised values for UEI and TEI in order to allow comparisons in the same order of magnitude. The third datasheet of Intervencoast_tables_filtered.xlsx shows the same data of the first one but filtered to the 29 typologies found in the study area. These filtered data were those used by the article [1] , and for the pie graphics shown in the fourth datasheet, which represent the distribution of each typology in each of the five ECU. Moreover, a pie graph with the consolidated data of the five ECU is also included. The last datasheet shows the UEI and TEI values for each typology in each ECU, which could be useful for a further analysis in those geographical areas.
The third spreadsheet ( Intervencoast_tables_boxplot.xlsx ) includes the data filtered and organised to obtain the graphs 4, 5A and 5B of the article [1] . These calculations have a higher level of complexity than those of the second spreadsheets, because they include more robust statistical analysis. Initially, Fig. 4 of [1] is a box plot analysis based on the Tukey Test, which shows the TEI extreme and mild outliers in three filtered scenarios (29, 26 and 25 typologies). The next datasheet shows the data used for the graphs 5A and 5B of [1] , which use the conditional format option of MS Excel to show graphically the value of TEI for each typology and ECU and the percentage of overall TEI.
The two Google Earth files (kmz) that complement the dataset show the geographical location of each position mark describing the human interventions in the study area, which comprise the complete inventory. Those two files have the same information, but organised in a different manner, in order to make easy their consultation and manipulation. One of the kmz files groups the 3957 position marks for the 38 typologies of human interventions. Meanwhile, another file groups the position marks within the five ECU. These two files are of the utmost importance for any researcher or practitioner interested to see some specific human intervention or geographical sector, because the software of Google Earth allows to navigate virtually on the study area ( Fig. 4 ).

Study area
Colombia has officially three coastal zones, according to Decree 1120 of 2013: Continental Caribbean Coast, Insular Caribbean Coast and Pacific Coast. The dataset shown in this article covers the first of them. In the same Decree, five Environmental Coastal Units (ECU) are defined for the study area: La Guajira peninsula (GUAJIRA); the northern slope of the Sierra Nevada of Santa Marta (VNSMR); Magdalena Delta and Canal del Dique (MAGDIQUE); Sinu Delta (SINU); and Darien Gulf (DARIEN). Their boundaries are shown in Fig. 5 .
The approximately 1700 km shoreline of the study area alternates between deltaic plains and low coasts with high coasts of mountainous segments [2] . The low-lying coasts contain beaches, sand barriers and spits, normally associated with lagoons and mangrove swamps. On the other hand, the high coast sectors are represented by cliffs of sedimentary rocks in the northernmost end (La Guajira) and the middle part (between Barranquilla and Cartagena city), while the cliffs around the Sierra Nevada de Santa Marta massif and the southernmost end (Panama border) correspond to more resistant igneous and metamorphic rocks [3] . Between the deltas of the Magdalena and Atrato rivers, the coast is backed by Holocene marine terraces and influenced by the mud diapiric phenomena [4] . This last one is a process reshaping the sea bottom trigged by the rising of low density material deforming the upper sediment layers or outflowing of the continental shelf; in both cases shoals and islands can form, such as El Rosario archipelago near Cartagena city [5] . Similar phenomena occur at the coast (e.g. mud volcanos of Totumo and Arboletes) producing tourist attractions, but also a relevant risk for the surrounding population. According to National Statistics Institute [6] , the Caribbean region of Colombia has large areas (departments of Choco, Cordoba, Sucre, Magdalena, La Guajira) with socioeconomic development based on the primary sector. The industries and the third economic sector is highly concentrated in the densest areas between Cartagena and Santa Marta, which represents less than a third part of the coastline. Furthermore, the most populated cities of the study area (Barranquilla, Cartagena, Santa Marta, Cienaga and Riohacha) represent one sixth of the most populated cities (over 3 million inhabitants) in the country, and still concentrates little over 6% of the total national population [6] . Related to the economic infrastructure, port activity is highly concentrated in Barranquilla and Cartagena, where the biggest port facilities are placed [7] . In addition, tourist activity within the '3S' tourism category (Sun, Sea and Sand; [8] ), is highly concentrated in Santa Marta, Cartagena, and Coveñas [ 6 , 9 ].

Inventory of human interventions
The inventory of human intervention in the study area was compiled using the structure of coastal uses and activities proposed by Botero [10] . This scheme served as a reference for selecting the 38 types of human interventions identified through Google Earth. A code system was defined to represent the type of intervention using an alphanumerical coding: the first three letters represent the ECU where the intervention is located, the following three letters represent the intervention typology, and the last three digits stands for the numerical order.
The instrumentation for data collection relied on the software Google Earth because it provides easy access to numerous satellite images of the study area with adequate horizontal and vertical resolution to observe the earth relief and identify geomorphological units, both natural and anthropogenic [ 11 , 12 ]. The image information was mostly sourced from the collection of satellite images of Google Earth, but alternative imagery services were also used (Nokia, Bing, ESRI). The majority of the georeferencing work was done through Google Earth; although, other geographic information systems, such as ArcMap from ESRI or the open source gvSIG, were used to assist the registration of the interventions within the alternative imagery inputs.

Simplified environmental impact assessment
The environmental impact assessment was calculated from a simplified version of the Conesa [13] equation. Initially, the frequency of human interventions by each typology was counted in the MS Excel datasheet, using the function "COUNTIFS" to extract the amount of interventions at a desired typology (FREQ). Later, the values for each attribute of environmental impact (EXT, INT, REV, PER) were allotted according to the levels defined by Conesa [13] . Stemming from these values, the UEI was calculated with the MS Excel function "SUM" divided by the maximum environmental impact value (32). Finally, the TEI value was calculated multiplying the UEI score with the frequency of occurrence previously counted. Details about interpretation and the pertinence of each parameter and calculation are in [1] .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that might have appeared to influence the work reported in this paper.
vulnerabilities when facing extreme meteorological phenomena and climate changes at coastal communities ".

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.dib.2020.105847 .