Toward Agile BI By Using In-Memory Analytics

This paper explores one of the newer technologies related to the field of Business Intelligence: in-memory technology. The new class of in-memory BI tools turns a BI solution into an agile BI solution. Also, the paper focuses on the main data models used by in-memory BI technologies and tries to answer following questions: Which are the main characteristics of an agile data model? And, which is the best data model that can be used for enabling an agile BI solution?


Introduction
In the last years, emerging technologies such as interactive visualization, in-memory analytics and associative search marginalized IT role in building BI solutions.Figure 1 shows the trend in use of these technologies (as search terms) using Google Trends.We see that the interest for these technologies has increased in the last years.Interest over time Also, Figure 2 shows how these technologies affect businesses.These technologies allow business people to do basic exploration of larger data sets and to find better answers to business problems.In-memory technology has the potential to help BI systems to become more agile, more flexible and more responsive to changing business requirements.This section takes a look at the pros and cons of in-memory BI.The primary goal of the in-DOI: 10.12948/issn14531305/18.3.2014.03memory BI technology is to replace traditional disk-based BI solutions.The important differences between them are: speed, volume, persistence and price [1].
For decades BI solutions have been plagued by slow response times, but speed is very important in analysis and in-memory BI technologies are faster than disk-based BI technologies.In-memory BI technologies load the entire dataset into RAM before a query can be executed by users.Also, most of them can save significant development time by eliminating the need for aggregates and designing of cubes and star schemas.

Fig. 2. How in-memory analytics, interactive visualization and associative search affect businesses
The speed of in-memory technology makes possible more analytics iterations within a given time.  1 presents a comparative analysis using the following criteria: 1) the main characteristics; 2) query language; 3) data model [5].According to [5], an agile BI solution requires: 1) an agile development methodology; 2) agile BA; and 3) an agile information infrastructure.An agile information infrastructure must be able to extract and combine data from any data sources, internal and external sources including relational, semistructured XML, multidimensional and "Big Data.According with these requirements, the main characteristics of a data model for agile BI are:  adaptable to rapid business changes;  agile design;  high flexibility to analysis;  excellent speed of analysis;  easy and universal access to any data sources.Most of in-memory BI solutions use the fol-lowing data models: dimensional model (star schema, snowflake or combinations), hypercube and "associative" data model.Which of them are agile?The next section tries to answer to above question.Also, the next section briefly presents a comparative analysis of these data models using the following criteria: 1) basic concepts; 2) modeling approach; 3) flexibility.

Measures and Dimensions versus Free Dimensional Analysis
In dimensional model and hypercube we distinguish between measures and dimensions.The main concepts of dimensional model are: facts, dimensions, granularity and hierarchies.The main characteristics of dimensional model are:  the information can be classified into: facts (data elements that are analyzed) and dimensions that provide descriptive information about each fact;  a fact is a numeric attribute of fact table.
Values of facts changes continue;  the granulation of a fact refers to the level at which the information is stored in the dimensional model;  usually the dimensions contain static data and are not normalized;  most dimensions have hierarchies;  dimensions are essential for data analysis;  the associations between fact tables and dimensional tables are defined explicitly with foreign keys;  SQL queries a subset of tables from dimensional model.Query result sets are independent of each other.The hypercube is a set of variables/measures, which use the same dimensions for identification.The main concepts of hypercube are: dimensions, hierarchies, hypercube cell, measures, and sparsity.In a hypercube, a dimension is represented by an axis and can have one or more members.Usually, a member can have only one parent.A member with no parent is called "root".A node with no child is called "leaf".Most dimensions have hierarchies.Hypercube cells contain basic measures and/or derived measures.Hypercube is implemented by multidimensional databases.The dimensional model and hypercube use predefined hierarchies for accessing and exploring data."Associative" model free users from the paradigm of dimensions versus measures (Figure 4).The model is implemented by Qlikview tool (Qlik Tech company, a leader in Magic quadrant for Business Intelligence and analytics platforms/Garter, 2014) [7]."Associative" model makes no distinction between attributes that are facts and attributes that are dimensions.The word "associative" puts emphasis on understanding how datasets relate to one another.This model is built around the concept of datasets with related logic tables.The datasets are loaded in memory, in a compressed and fully normalized format, via the Load script.with the identification of the important processes from the company (where the data are collected).Dimensional modeling uses a waterfall approach with the following steps:  identifying the business process/processes which will be modeled.For each identified process the will be created one or more fact tables;  setting the granulation for each fact table;  setting the dimensional tables for each fact table.The granulation for each dimension will also be determined;  setting the basic and derived measures;  setting the dimensional attributes and their description;  how different changes in dimensions are managed (slowly changing dimensions)? storage of the pre-calculated aggregates in the aggregate fact tables.Hypercube modeling is divided into the main steps:  identifying the measures;  identifying the dimensions and the hier-archies;  defining the hypercube or multi-cube;  refining the hypercube or multi-cube (e.g., defining the aggregation formulas).A data warehouse is usually built before designing the hypercube.Associations between dimensions are not computed."Associative" model is a bottom-up model and it is developed by each department and then adopted by the company.The model and the user interface are developed together using an agile development approach (e.g.SCRUM).This approach changes the focus from data driven to the decision driven.This approach is divided into the following phases:  identifying the initial business requirements and the data requirements;  prioritizing the business requirements (SCRUM product backlog) and defining the data staging requirements (QVD files for larger deployments);  iterative execution phase (many sprints) that includes: data loading (configuring of the connections, development of the initial load script), data modeling, data provisioning, user interface development (use of the data for analysis), testing, user review and refining;  deployment [10] (Figure 5). 4 Some considerations about Advanced Business Intelligence Queries SQL ranking and windowing aggregate functions combined with nested queries enable you to answer complex BI queries, but it is difficult for end users to write these queries.Also, MDX language, used for querying the multidimensional data stored in hypercube, is difficult for end users.For example, we have the following BI query: "Finds the top two sellers for each city that contributes more than 5% of the sales within its region".The data sources are Vanzari.xls,Agenti.xls and Judete.csv.We will use a partial snowflake schema with a fact table: Vanzari and four dimension tables: Agenti, Judete, Articole and Calendar (Time dimension).The schema was implemented in Oracle DBMS.The hierarchies are: agentiorasjudetregiune, artIdcategorieid and tim-pidperioadaziualunaan.The excel file Vanzari has attribute Data.The attributes of Calendar table were defined using data functions.Data was imported from files (excel and csv).The Figure 6 shows the structure of tables.The query result is: This query uses:  the hierarchy: agen-tiorasjudetregiune;  dense_rank () function with logical partitions;  nested queries.Time dimension is not used in this query.If end-user wants to use Time dimension, then he needs to rewrite the SQL query.The query result set is independent of the previous query result set.A large variety of powerful analytics are available with "associative" model such as: aggregations on-the-fly, set analysis, comparative analysis, conditional analysis, calculated dimensions, and so on.Data sources are loaded in memory, via the Load script.Database is not required.The data model and the associations between data sources are generated automatically during the data load process.The end users don't need to use a definition language.Also, the end users don't need to use a query language.They only create a pivot table with three "dimensions"/fields (Regiune, Oras and a "calculated dimension") and one expression: sum (Vanzarea).The definition of the "calculated dimension" (Top 2 Agenti) is: Also, two list boxes were created: An and Regiune.We must select only the values of Regiune list (such as, Transilvania).The "calculated dimension" will return top 2 sellers for each city that contributes more than 5% of the sales within its region (such as, Transilvania), and a null value for all others (Figure 7a) [9] [11].But, we can make every data selection (any combination of year and region).For example, Figure 7b shows top 2 sellers for each city in 2012, only for Transilvania region and Muntenia region.In conclusion, more information, high flexibility, easier for end users to make Top N analysis than with SQL/MDX.You can select any value for any attribute and all the related data from the entire data model will be displaying (associative search).For example, when we select 2013 in the An list box, the screen automatically updates to show the associated data in the Regiune list box.The Bucovina region is shown with a gray background to indicate that is not associated (we have no sales in Bucovina, in 2013).Selection is green, unrelated data is gray and associated data is white (Figure 7c  QlikView Set Analysis is a powerful tool for comparative analysis, for example, current year versus previous year, ordered products versus products not ordered, or selected data versus the unselected data.

Fig. 1 .
Fig. 1.In-memory analytics versus interactive visualization versus associative search.Interest over time

Figure 3
Figure 3 presents a disk-based BI solution versus an in-memory BI solution (e.g. a Qlikview BI solution).
Data model and interface inside of Qlikview document, in RAM User Inter face Data sources ETL DW hypercube Data sources ROLAP (SQL queries are generated graphically, flexible, not user-friendly) Data Warehouse is not required.Load data and then work off-line MOLAP (low flexibility, limited number of dimensions, aggregated data into cubes) DOI: 10.12948/issn14531305/18.3.2014.03

Fig. 5 .
Fig. 5.An agile development approach for the "associative" data model

Fig. 7 .
An advanced BI query with "associative" data model Also, we can select data based on associated values.For example, we want to see regions where we sold the product "Capac protectie".The search term will not only be checked against the Regiune list box, but also against the content of the entire data model (Figure8).

Figure 10
shows a comparative analysis using different time pe-riods: current month versus previous month versus two months earlier.We can compare results for three different time periods in one single view based on the same selection state.The comparisons are dynamic and based on the user's selections.

Fig. 11 .
Fig. 11.Comparative analysis between different periods using alternate states

Table 2
presents a comparative analysis between the data models and tries to show which of the models is agile.We see that the "associative" model has almost all characteristics of an agile data model.DOI: 10.12948/issn14531305/18.3.2014.03

Table 2 .
Hypercube versus Star schema versus associative model In conclusion this paper has identified the main characteristics of an agile data model.Considering these characteristics, the paper made a comparative analysis of the data models used in-memory BI technology: dimensional model, hypercube and "associative" model.The "associative" model has almost all characteristics of an agile data model.Also, three emerging technologies (interactive visualization, in-memory analytics and associative search) are implemented by Qliview.Deploying of these technologies in a BI solution results in an agile BI.