Real estate announcements monitoring dataset for Latvia 2018

The dataset represents a collection of real estate announcements published in 2018 in the Latvian leading advertisement website www.ss.com [1]. In the Latvian case, mentioned advertisement website is alternative information source in contrast with several (5–7) large real estate agencies. The mentioned advertisement website has no important competitors in Latvia, closer competitor reklama. lv [2] is 4–5 times smaller. Advertisement website www.ss.com represents information from small and medium size agencies, as well from individuals, who want to take part in the real estate market. The collected dataset reflects the observation dynamics of 12 months during 2018, including in total 238 thousand observations. Dataset has 24 dimensions, such as in announcement mentioned price for real estate, deal types, dimensions of location of real estate, such as region, district, address; characteristics of real estate, such as real estate type (land, flat and so on), size and main characteristics for each real estate type, such as land area or bad rooms in apartments. The dataset is hosted in Data Archiving and Networked Services (DANS) repository [3].


Data description
Database consists of Latvian real estate announcements, total 238 th. observations. The dataset include 24 dimensions (fields). The database fields are: observations month, real estate 7 chapters (for example flats, land and so on), 18 sub chapters, deal types (such as selling offer, for rent), price (euro), 4 price units (per day, month and so on), region, district, address, rooms, area, area units, floor, floors in living building, elevator option in building, building series name, building type, facilities, floors in building, rooms in building, land area, land area units, amenities and description, land purpose.
Specifications Table   Subject Business, Management and Accounting (General) Specific subject area Business and management, marketing, business intelligence, econometrics Type of data Recently in the internet grow amount of not correct information, which is known as "fake news". Probably, fry advertisement websites it is possible to influent on market, with "fake announcements". For Data Scientists and other researchers it is good challenge do develop "fake announcements" criteria and use it later in other fields.
The author used the dataset to estimate infrastructure objects in Riga (such as schools, shops and public transport stops) influence on sellers his owned object evaluation, as well as for flats from serial buildings price index development in different dimensions. The dataset is developed based on of real estate market announcements monitored during 2018. The dataset combines different deal types: real estate sells offers, rental offers; by different regions and by different types of properties. The dataset will be very helpful for real estate market specialists. The data is original, collected by author. Take into account, that announcements average life cycle is 4 week (after it, announcements are deleted from internet), it is not possibly repeat collection for 2018 year, so the dataset is unique, research cannot be repeated. However, the scientific protocol for collecting the data, published in this article, allow to download actual data. Based on actual data it is possible check scientific quality of data collection in 2018, as well as understand market changes in time. Table 1 represent dataset observations by chapters and deal type.
In the author's opinion, the most important data field is price of real estate. Dataset includes prices from different object groups, such as land, buildings, flats. It is not possible to compare prices in different groups, but possible inside groups. Fig. 1 shows sq. m. price analysis for flats group in Riga. Fig. 2 represent collected real estate announcements by Latvian districts. Riga district have 38.2 th. observations (14% of all observations), but capital city of Latvia, Riga have 90 th. observations (33% of all observations), its reason, why Riga city real estate is analyzed separately, in Fig. 3.
Next important indicator are location of real estate and building series. Data of this two dimensions are shown in Table 2.
It is important to note, that Table 2 represents only flats the most offered for sale, dataset contains also districts with less then 20 observations per district, and building series with less then 100 observation. Taking into account that so small amount of observation, this data (total 222 observations) was removed from Table 2. Dataset include all data. From Table 2 it is visible, that for scientific analysis it is needed to clear data from districts with small then 100 observations and building series with less then 150 observations.

Experimental design, materials, and methods
In Latvia it is possible to observe market advertisements in internet with Data Mining technologies. There was collected average 20 thousands offers per months of real estate in Latvia in 2018 (from 16.5 th. in January to 22.3 th. in May). After data cleaning, removing of duplication, "fake announcements" and mistakes removing, amount of observation was compared with official deals statistics data published by The State Land Service [4]. Collected announcements approximately two times (6932/15 500) exceed amount of real deal. However, average prices, take into account, that dataset collect offer price, have not so big and important difference. This means that for each real deal there are several announcements (approximately two). Another reason of difference in data may be The State Land Service data aggregation methodology, they aggregate data from February to February for the year. The State Land Service publish only aggregated data for free, does not publish information about each deal. Data mining allows to get more detailed information. This shows the advantage of the Data Science and Data Mining methods for collecting and providing information for different purposes.
The database was collected from biggest in the Latvia advertisement website www.ss.com [1] from its real estate section, by monthly repeated following R studio data scraping code (the scientific protocol for collecting the data) (The mentioned advertisement website has no important competitors in Latvia, closer competitor www.reklama. lv [2]  After downloading data, dataset was created with standard R Studio functions.  Riga Technical University (RTU) is one of the regional leading in quantitative economy analysis and mathematical modeling. Internet announcements monitoring was started in RTU in the end of 2017 as part of integration of new data sciences technologies in learning process. In additional, RTU researchers develop different approaches and models for transport [5,6], logistics [7e9] gas market [10,11], tax and duty polices [12] and medicine [13,14]. Table 2 Most offered for sale flats by districts and buildings series in Riga in 2018, announcements (observations) (full data is available in dataset [3]