Dataset for holiday rentals’ daily rate pricing in a cultural tourism destination

This data article describes a holiday rental dataset from a medium-size cultural city destination. Daily rate and variables related to location, size, amenities, rating, and seasonality are highlighted as the main features. The data was extracted from Booking.com, legal registration of the accommodation (RTA) and Google Maps, among other sources. This dataset contains data from 665 holiday rentals offered as entire flat (rent per room was discarded), with a total of 1623 cases and 28 variables considered. Regarding data extraction, RTA is ordered by registration number, which is taken and, through a Google search with the following structure: “apartment registration no. + Booking + Seville”, the holiday rental profile in Booking.com is found. Then, it is verified that both the address of the accommodation and the registration number match in RTA and Booking.com, proceeding with data extraction to a Microsoft Excel's file. Google Maps is used to determine the minutes spent walking from the accommodation to the spot of maximum tourist interest of the city. A price index based on the average price per square meter of real estate per district is also incorporated to the dataset, as well as a visual appeal rating made by the authors of every holiday rental based on its Booking.com photos profile. Only cases with complete data were considered. A statistics summary of all variables of the data collected is presented. This dataset can be used to develop an estimation model of daily prices of stay in holiday rentals through predetermined variables. Econometrics methodologies applied to this dataset can also allow testing which variables included affecting the composition of holiday rentals' daily rates and which not, as well as determining their respective influence on daily rates.


a b s t r a c t
This data article describes a holiday rental dataset from a mediumsize cultural city destination. Daily rate and variables related to location, size, amenities, rating, and seasonality are highlighted as the main features. The data was extracted from Booking.com, legal registration of the accommodation (RTA) and Google Maps, among other sources. This dataset contains data from 665 holiday rentals offered as entire flat (rent per room was discarded), with a total of 1623 cases and 28 variables considered. Regarding data extraction, RTA is ordered by registration number, which is taken and, through a Google search with the following structure: "apartment registration no. þ Booking þ Seville", the holiday rental profile in Booking.com is found. Then, it is verified that both the address of the accommodation and the registration number match in RTA and Booking.com, proceeding with data extraction to a Microsoft Excel's file. Google Maps is used to determine the minutes spent walking from the accommodation to the spot of maximum tourist interest of the city. A price index based on the average price per square meter of real estate per district is also incorporated to the dataset, as well as a visual appeal rating made by the authors of every holiday rental based on its Booking.com photos profile. Only cases with complete data were considered. A statistics summary of all variables of the data collected is presented. This dataset can be used to develop an estimation model of daily prices of stay in holiday rentals through predetermined variables. Econometrics methodologies applied to this dataset can also allow testing which variables included affecting the composition of holiday rentals' daily rates and which not, as well as determining their respective influence on daily rates.
© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data
The dataset contains raw data from 665 Sevillian holiday rentals (Table 1) offered as entire flat extracted mainly through Booking.com searches, among other sources ( Table 2). The Microsoft Excel worksheet provided as supplementary data for this article (see Appendix A) includes the complete dataset of 1623 cases and 28 variables (Table 2) considered. Summary statistics of the dataset's numerical and integer (Table 3) and categorical (Table 4) variables are also presented. Finally, Table 5 shows how DINDEX variable is constructed.
Specifications Table   Subject Tourism, Leisure and Hospitality Management Specific subject area Holiday rentals' daily rate pricing Type of data All holiday rentals legally registered in Seville designated in Spanish as "viviendas con fines turísticos" (VFT) (i.e. homes for tourism purposes), offering the full modality (i.e. the entire flat) were taken as total research population, based on "Registro de Turismo de Andalucía" (RTA) (i.e. the Andalusian Tourism Registry). Only cases with complete data were considered. The final sample included the same lodgings with different offers regarding the number of beds.

Description of data collection
The VFT registration in RTA is ordered by registration number. The registration number is taken and a Google search is made following this structure: "[VFT registration code] þ Booking þ Seville". It is verified that both the address of the VFT and the registration number match in RTA and Booking.com. All the data of the variables considered of the VFTs are extracted and the price is taken in different time periods (see Table 2). Data source location The city of Seville, Andalusia, Spain Data accessibility The complete raw dataset is provided as supplementary Excel  Table 5).
[   Table 1 presents the population, sample and the total number of cases included in the dataset. In the Andalusian legislation regulating holiday rentals in Seville [1], these accommodations are designated as "Viviendas con Fines Turísticos" (VFT) (i.e. homes for tourism purposes). A VFT can be rented in full (i.e. the entire flat) or in part (i.e. a spare room). All legally registered VFT [2] were considered, excluding the spare room rented modality. The number of cases includes same VFT offered at different prices regarding the number of beds. Table 2 shows all variables considered in the dataset, its type, description, and source.

Experimental design, materials, and methods
First, based on RTA register [2], VFT code and its address are copied and displayed in ascending order by VFT code through a Microsoft Excel worksheet (the dataset presented in the article). Second, one by one, a google search is started following the structure: "[VFT registration code] þ Booking þ Seville". Third, a click is made on the Booking.com VFT profile and is checked that both the VFT code and its   Table 2) are extracted and copied into the Microsoft Excel worksheet file. VFTs with incomplete data are discarded. The daily rate is copied in all the different time periods considered (HSWD, HSWE, LSWD, LSWE, SE1, and SE2; see Table 2) and later weighted (see Table 4) in order to get a sole PRICE variable. Once this process is finished, through Google Maps searches [6], MIN (Table 2) is obtained one by one and copied into the aforementioned Microsoft Excel file. DINDEX variable is filled following the criteria described in Table 2 with the data obtained in Table 5. Finally, all the VFT photos available in their own Booking.com profile are observed, and each VFT is rated by the authors regarding its visual appeal (VSAP, Table 2).
To conclude, a statistical summary of the numerical, integer (Table 3) and categorical (Table 4) variables included in the dataset are presented.