Business Intelligence (BI) is almost a throw away term nowadays, just like other enterprise software terms like CRM. So, what constitutes a full BI stack? Unlike other enterprise software where the whole package usually comes together, BI seems to be constituted of few loosely integrated software. Hence, identifying a full stack, as well as finding out which components fit where in the stack, can be difficult. However, generally, a full BI stack consists of the following, and the good news is one can get the whole stack for free through open source avenue
Extract, Transform, and Load
The first layer above the existing data sources is the Extract, Transform, and Load (ETL). In essence this layer bridges the data sources with the Data Warehouse (see below). Of course this can be performed even with the most basic tool like text file dump and upload, but actually using purpose-built ETL tool will help in the following aspects:
- visualisation of the ETL process and orchestration
- database abstraction layer for popular enterprise software like SAP or SugarCRM
- easy application of common transformation tasks without writing a single line of code. Common transformation tasks include data cleansing, what to do with null values, type casting, value mapping, row to column transformation, calculation, etc
- automatic detection of append or rewrite, so content of Data Warehouse can be made up to date without marking of extracted source data or truncating destination data first
- built-in debugging system
- automated logging of process result, preferably in separate database
- orchestration of ETL processes, e.g. perform ETL for dimensions in the star schema first before performing it for its fact
- scheduling of regular refresh of Data Warehouse
The logic behind creation of Data Warehouse is performance optimisation. Theoretically, data can just be pulled from source systems and displayed any way you please. But if the sources contain lots of data, say hundreds of thousands of insurance policies, then it is easily understood that performing year-on-year analysis directly on the data source is going to take a long time. On top of that, anybody using the source systems will experience a slow down. So, think of Data Warehouse as some kind of very lazily updated materialised view.
The performance optimisation doesn’t stop by setting up separate database for the Data Warehouse. Even the structure of the data storage is optimised for analysis. Here, denormalisation is acceptable to certain degree in order to get optimum query performance. One of the best way to store data for this purpose is by setting it up in star schema configuration, where a fact, the measurable business information, is separated from dimensions, additional information about the fact.
There is actually a separate software category called Multi-dimensional DataBase Management System (MDBMS) optimised for this kind of data storage, Essbase (currently part of Oracle) being the first. However, in absence of purchasing power for this software category, one can make do with usual Relational DataBase Management System (RDBMS).
Data storage optimised for quick access may not be optimised for presentation to users. That’s where semantic layer come into place. The concept was first used by one of the best BI suite out there, Business Objects. The modification performed by this layer includes:
- Changing data label (i.e. field name) from cryptic name like premium_trx to more understandable term like Premium in transaction currency
- Creation of derivative information that users may need, but usually not stored in database for efficiency reasons. For example, based on the transaction date, which is stored in database, the semantic layer will calculate the quarter and semester when the transaction happened, information which may be of value to users
- Rearrange the grouping of information to be more intuitive to users. For example, product groups may be normalised to separate table from products in database, but when presented to users, product groups and products should be side by side
- Report server can act as hub where reports from all departments are gathered together so that they’re easier to find
- Report server controls who can see which reports
- Report server can automate the regular distribution of reports to all related parties
A business information can be viewed and analysed from many different angles. To actually anticipate all the different ways an information can be analysed and to predefine a report for each of the way is unrealistic. The better approach is to give users freedom to form any way to look at data as they please. This is called ad-hoc reporting capability. Basically, after presentation of information has been simplified by semantic layer, users can just rearrange those information into new form by simple means like drag and drop.
Lastly, busy executives would like to be able to quickly assess the health of the company in just a few information-rich pages. Analogous to cars’ dashboard where as much information is crammed to a small area that can be glanced easily by the driver, a BI dashboard presents computer screens full of summarised information for management decision making. Key features in this part of the BI solution are:
- easy layouting
- easy integration to centralised intranet portal for easy access by the executives
- ability to retain security settings even when viewed from separate intranet portal server, not directly from BI server
In line with management by exception principle, it will be nice if the BI suite automatically sends out predefined alerts to executive in charge whenever there’s anything out of ordinary that needs his/her attention
- management must identify what conditions are considered out of ordinary
- reports must be designed to communicate each of these conditions as succinctly as possible to busy executives in charge of the matter
- BI suite will then regularly check for extraordinary conditions above and automatically notify the executives via e-mail when it happens