Data warehouse for analysing music sales on a digital media store

Nowadays, every company knows that when making a decision that has a potential in affecting their assets, an accurately processed report is necessary in order to support the reasoning behind their decision. Generating a report for stakeholders quickly and accurately is highly required in assisting them making a data-driven decision. By developing a data warehouse, it is possible for a company to do a data-driven decision making to appeal to their customer segments. This paper proposes a data warehouse model design to analyse the sales data contained in the database. The method that was implemented for this particular data warehouse development is the nine-step methodology designed by Kimball. The results are then presented in pdf form and an interactive dashboard.


Introduction
Almost every company that exists now is using a database to store their data. They realized that data stored in a database are very valuable, as it can provide an up-to-date and accurate information on how business is doing [1]. The stored data in the database needed to be processed to produce a result of information that is useful for analysis purposes. In processing the data, it is necessary to create a reporting system that is not only fast, but also accurate.
A reporting can be done by creating a long, detailed, and complex SQL Query, but doing so will take too much time compared to developing a data warehouse. A data warehouse is much more efficient than querying because it processes data into information and from information into a report without taking a lot of time reading any unnecessary data [2] [3]. All the data that is processed in a data warehouse will provide relevant information to management and executives in a company to help them make a business decision [4].
In this particular case study, the company already used a database to store their transactional data, but because they did not implement a data warehouse in their system, they are unable the generate a report data automatically and efficiently. Currently, reporting the sales and analysing them is very timeconsuming as they need to either export the database into a spreadsheet or make a complex query then analysing them further. Therefore, the development of a data warehouse is going to be very beneficial for this company, as the result will be the data displayed in various reports that will assist the company in making a data-driven decision.
The methodology that was used for the data warehouse development is the 9-step Kimball method. This method is conceived around the 1980s and still being used until now. It has been adopted by many companies and is a mainstream industry standard practice [5].

Data Warehouse Concept
A data warehouse is a relational database that was created for a query process that was meant to aid the process of analysis and reporting [6]. It is developed by integrating data from multiple sources that support analytical reporting and decision making. Nowadays data has become the prerequisites in making a decision thus causing the interests in creating analytics reports using a data warehouse increased exponentially [7]. The main purpose of a data warehouse is to assists companies in performing strategic planning and decision making based on long-term data storage to make a quick and accurate decision [8]. Developing a data warehouse requires data cleaning, data integration, and data consolidations [5].

Data Warehouse Design
The process in designing and developing a data warehouse can be done by first implementing the ninestep Kimball methodology, then using a modelling dimension create a dimensional model which in this case, is a star schema [9] [10]. An Entity relationship diagram (ERD) is used to draw the Online transactional process (OLTP). Then the most important process, ETL (extract, transform, load) are done. This process extract data from the source, transform the extracted data into something useful for the data warehouse requirement and load a function to input data by running a script periodically. After the ETL process, the results can be presented in the form of a report or a dashboard to for easier usability.

Kimball Methodology
In developing a data warehouse using the Kimball methodology, there are nine steps that need to be followed: 1. Choosing the process. The business process that was used as a model to design the data warehouse is this: to buy a song or album from the store, a user should register an account first. After that, the user can log in and proceed to the purchasing process. In this process, the user can choose the items they would like to buy, and then adding it to their cart, where the system will automatically calculate the amount that user should pay. The user can then proceed to checkout, choose their preferred payment type, then complete the transaction by paying. Items that the user just bought can then be downloaded from the library in their account.
2. Choosing the grain. In song sales, the analysis covers the number of songs sold and song total sales. The analysis will be done for each country, genre, track, playlist, and period (months, quarter, year).

Storing pre-calculation in the fact table.
Star schema approach was chosen here in order to get the total transaction from various dimensions. Figure 1 shows the star schema of the digital media store. 6. Rounding out the dimensions table. The descriptions for each dimension tables are shown in table 2. 7. Choosing the duration of database. The data that will be processed into the data warehouse is the sales history data. The duration of the data is from when the database was created, 2009 until 2013, which means 5 years' worth of data is used.  The report can be viewed sorted by playlist 9. Decide the physical design. This step discusses the ETL process. The ETL process is done every day to ensure the data is up to date to ensure the information that will be received are accurate.

Extract, Transform, Load Process
This is the most important step in the data warehouse development, where the source data from an online transactional process is integrated into the data warehouse. Pentaho data integration was used in this ETL process [11]. Figure 2 shows the ETL process for every dimension table. Figure 3 shows the ETL process for the fact table, InvoicesFact. Figure 4 shows the successfully executed ETL job.

Results
After successfully executing the ETL process on Pentaho, utilizing the data warehouse is now possible. Generating reports on the data warehouse is much faster, and the report can be generated based on a various parameter. As an example, table 3 shows how many songs are sold for each genre and table 4 shows how many times a track has been bought.
A dashboard is also made to generate an overall report. In this case, the tool that was used to create the dashboard is Qlik Sense [12]. Figure 5 shows a pie chart of the total customer for each genre, and it is clear that the majority of customer bought a rock genre song. Figure 6 is a bar chart for the customer's country, and the chart shows that most of the customers are from the USA. Footnotes Footnotes should be avoided whenever possible. If required they should be used only for brief notes that do not fit conveniently into the text.

Conclusions
By developing a data warehouse, generating a report can be done easier and faster than querying SQL manually. Reports can be generated in many forms, such as a dashboard, making it more readable and easier to understand for the management and executives in a company. It can quickly show any sales data and history, customer preferences, and which products is the best-selling one. These reports can help the company management and executives in making a fast and accurate data-driven decision.