OLAP, what’s coming next? (2024)

Contents

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.

What is OLAP?

OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s. Many other vendors of multidimensional databases have adopted MDX for querying data, but with this specific language, management of the cube requires personnel with the skill set.

Why replace OLAP-Cubes?

As my job as a business intelligence specialist and data engineer, I was always using OLAP-Technologies for quick ad-hoc analysis with tools like Tableau, TARGIT, Power BI, Excel, etc. As the Microsoft BI stack is still used widely around in small to large companies, the technology for it is mostly SQL Server Analysis Services (SSAS) or Microsoft Cubes. These are a compelling and fast way to aggregate and pre-calculate all of your corporate-related data and makes it available to your operational business users. Nevertheless, these cubes are getting more and more to their limits as you might have experienced that in your own company. There are several problems I see or encountered:

The multi-dimensional model of Analysis Services are not supported in the Azure Cloud and also does not look like it will soon (there is the tabular model of Microsoft cubes, but they still got some following limitations).
There is a limit to the size of data it can process fast. It was developed a long time ago it is not optimised for the vast amount of data nowadays.
Does not fit in the open-source and big data ecosystem due to:
- It is not distributed and cannot be parallelised as any new modern technologies (containers, clusters, etc.).
- It uses Microsoft Visual Studio to add or edit the cubes with is hard to source-control and make it impossible to modify for any other data-savvy person apart from the BI-Specialist.
The query language MDX is considerably hard to understand and difficult to write more complex queries.

Therefore I was researching for alternatives on the world wide web. But before I go into more dept what I found, I want that you look at the architecture as a whole. One reason is that nowadays are countless persons wish to make use of data, that’s why you might want to make it available to all of them or at least have a more open architecture. In my opinion, Data Lake is the buzzword for it where everyone easily can access data instead of BI-Specialists only or worse, infrastructure guys having access to the FTP-Server.

List of Cube-Replacements

Because you can make a one-to-one replacement, having a faster cloud-backend, virtualise a semantic layer or use a service cloud provider I categorised the different technologies in the following groups.

OLAP-Technologies

With OLAP-Technologies you replace your cubes one to one with another technology. Therefore you don’t change anything on your current architecture but replace your cubes with a modern big data optimised technology which focus on fastest query response time. See the Appendix for a comparison between modern OLAP Technologies.

Cloud Data Warehouses

Snowflake (first fully cloud-based DWH, optimised for DWH and Analytics)
Google BigQuery
Amazon Redshift
Azure SQL Data Warehouse

Another approach is that you change your on-premise Data Warehouse to a Cloud Data Warehouse to get more scalability, more speed and better availability. This option is best suited for you if you do not necessarily need the fastest response time and you do not have tera- or petabytes of data. The idea is to speed up your DWH and skip the layer of cubes. This way you save much time in development, processing and maintaining of cubes. On the other hand, you lose in query latency while you create your dashboards. If you mainly have reports anyway, which can run beforehand, then this is perfect for you.

Data Virtualisations

Dremio → Apache Arrow in-memory Technology
Cisco Data Virtualisation (previously called Composite Software)
Denodo Platform for Data Virtualisation
Informatica Data Virtualisation
IBM Big SQL
Incorta

You may have many source system from different technologies, but all of them are rather fast in response time, and you don’t run a lot of operational applications, you might consider Data Virtualisation. In that way, you don’t move and copy data around and pre-aggregate, but you have a semantic middle layer where you create your business models (like cubes), and only if you query this data virtualisation layer it queries the data source. If you use, e.g. Dremio, there you use Apache Arrow technology which will cache and optimise a lot in-memory for you that you have as well astonishing fast response times.

Serviced Cloud and Analytics

Looker (Use-case where they replaced MS Analysis Cube with Looker)
Sisense (ElastiCubes Technology)
panoply.io

Last option is to buy a Service Cloud Storage or Analytics vendors like Looker, Sisense or Panoply. These are very easy to use and create implicit cubes for you, meaning you just join your data together in your semantic layer of the respective tool and all the rest is handled by the tool, including the reporting and dashboard tools. In this way your more depended on the individual vendor, it might also be more expensive (prices are always very in transparent and hard to get), but you are very fast up and running.

If you go one step further, let’s say you choose one of the above technologies, you will most probably run into the need to handle intermediate levels in between. For example to prepare, wrangle, clean, copy, etc. the data from one to another system or another format especially if your working with unstructured data as these need to be mingled in a structured way at the end in one or the other way. To keep the overview and handle all these challenging tasks, you need an Orchestrator and some cloud-computing frameworks which I will explain in the two following chapters to complete the full architecture.

Orchestrators

Apache Airflow (created in Airbnb)
Luigi (created in Spotify)
Azkaban (created in LinkedIn)
Apache Oozie (for Hadoop systems)

After you choose your group and even your technology you want to go for, you want to have an Orchestrator. This is one of the most critical tasks that gets forgotten most of the time.

What is a Orchestrator?

An orchestrator is a scheduling and monitor workflows tool. For the different technologies and different file format working together, you need some orchestrator and processing engine that prepares, moves and wrangle the data correctly and to the right place.

Why would you need this?

As companies grow, their workflows become more complex, comprising of many processes with intricate dependencies that require increased monitoring, troubleshooting, and maintenance. Without a clear sense of data lineage, accountability issues can arise, and operational metadata can be lost. This is where these tools come into play with their directed acyclic graphs (DAGs), data pipelines, and workflow managers.

Complex workflows can be represented through DAGs. DAGs are graphs where information must travel between the vertices in a specific direction, but there is no way for information to travel through the graph in a loop that circles back to the starting point. The building blocks of DAGs are data pipelines or following processes where the output from one process becomes the input for the next. Building these pipelines can be tricky, but luckily there are several open-source workflow managers available, allowing programmers to focus on individual tasks and dependencies:

Cluster-computing frameworks

Apache Spark (→ Databricks / Azure Databricks )
Apache Flink (main difference to Spark is that Flink was built from the ground up as a streaming product. Spark added Streaming onto their product later)
Dask (distributed Python with API compatibility for pandas, numpy and scikit-learn).

To complete the list, we also need to address the computing frameworks mostly know for example Spark. Spark or Cluster-computing frameworks are unified analytics engines for large-scale data processing which means you can wrangle, transform, clean, etc. your data at a large scale with a lot of parallelisation. This can also be used and started within the above-mentioned orchestrator-tools.

Conclusion

There is not only the right technology to choose it’s also critical to define your requirements and goals correctly that you want to achieve with your data ecosystem. Based on these definitions you are selecting the technology in the right category either a one-to-one OLAP replacement, a faster Cloud Data Warehouse backend, virtualised semantic layer or you use a Serviced Cloud Provider.

If you include the featured tools from the very beginning like the orchestrator or cloud-computing framework, then for sure you will have the perfect fit for your use-case that also survives the very near future.

Appendix

Comparison modern OLAP Technologies

Republished on LinkedIn and Medium.

Comment by Hugo Delgadinho on 2019-08-05 15:40

Thank you for your post, you gave an modern approach to replace Olap cubes but I think we can see this old features but with a modern vision, as explained in this article https://www.imaginarycloud.com/blog/oltp-vs-olap/ .

Comment by Wade on 2019-09-21 22:55

You left out tabular analysis services which is really good.

Comment by Simon on 2019-09-23 11:45

Hi Wade, thanks for commenting. But actually under Why replace OLAP-Cubes, I stated that Tabular Cubes has similar problems or limitations as the Multidimensional. Search for “Tabular” and you’ll find it :-).

Discuss on Twitter |

big data cloud data warehouse cluster computing data architecture data science data pipelines OLAP

FAQs

Is OLAP still being used? ›

Is OLAP Obsolete? While OLAP cubes (or business intelligence cubes) are now unnecessary, it's important to note that OLAP workloads are in no way obsolete. OLAP itself enables the flexible multidimensional data analysis that leading organizations use every day.

Find Out More ›

What is replacing OLAP cubes? ›

In-memory analytics eliminates the need to store pre-calculated data in the form of OLAP cubes or aggregate tables. It offers business-users faster analysis, and access to analysis of large data sets, with minimal data management requirements.

When would a company use OLAP? ›

Common uses of OLAP include data mining and other business intelligence applications, complex analytical calculations, and predictive scenarios, as well as business reporting functions like financial analysis, budgeting, and forecast planning.

Learn More Now ›

Which database is best for OLAP? ›

Some of the best databases for OLAP are:

Amazon Redshift: A cloud-based data warehouse with columnar storage, MPP architecture, and built-in features for OLAP.
Google BigQuery: A fully-managed cloud data warehouse with a columnar storage engine and MPP capabilities, designed for OLAP workloads.

More items...

Learn More Now ›

What is an example of OLAP in real life? ›

Marketing: Industries like digital marketing, health care, eCommerce, and finance uses OLAP in their marketing. Example: Market Basket Analysis is a technique that gives the careful study of purchases done by a customer in a supermarket. This concept identifies the pattern of frequent purchase items by customers.

Get More Info ›

What are the 3 types of OLAP? ›

There are three main types of OLAP: MOLAP, HOLAP, and ROLAP. These categories are mainly distinguished by the data storage mode. For example, MOLAP is a multi-dimensional storage mode, while ROLAP is a relational mode of storage. HOLAP is a combination of multi-dimensional and relational elements.

Find Out More ›

What is the conclusion of OLAP? ›

OLAP enables users to search and filter data easily, facilitating quick and targeted analysis. OLAP serves as a foundation for business modeling, data mining, and performance reporting tools. OLAP empowers users to perform slice and dice operations, exploring data from different dimensions and perspectives.

Find Out More ›

Do we need OLAP? ›

Consider OLAP in the following scenarios: You need to execute complex analytical and ad hoc queries rapidly, without negatively affecting your OLTP systems. You want to provide business users with a simple way to generate reports from your data.

Know More ›

Which companies use OLAP? ›

Companies Currently Using Oracle OLAP

Company Name	Website	Sub Level Industry
U.S. Cellular	uscellular.com	Telephony & Wireless
VOIS	dvois.com	Software Manufacturers
National General Insurance	nationalgeneral.com	Insurance
Port of Seattle	portseattle.org	Marine Shipping & Transportation

2 more rows

Discover More ›

Is OLAP better than OLTP? ›

When to use OLAP vs. OLTP. Online analytical processing (OLAP) and online transaction processing (OLTP) are two different data processing systems designed for different purposes. OLAP is optimized for complex data analysis and reporting, while OLTP is optimized for transactional processing and real-time updates.

Get More Info ›

Is OLAP necessary? ›

The main advantage of OLAP is the speed of query execution. A correctly designed cube usually processes a typical user query within 5 seconds. The data will always be right at your fingertips to refer to while there is a necessity to rapidly take an important decision.

Explore More ›