OLAP, what’s coming next? (2024)

OLAP, what’s coming next? (1)

Contents

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.

What is OLAP?

OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s. Many other vendors of multidimensional databases have adopted MDX for querying data, but with this specific language, management of the cube requires personnel with the skill set.

Why replace OLAP-Cubes?

As my job as a business intelligence specialist and data engineer, I was always using OLAP-Technologies for quick ad-hoc analysis with tools like Tableau, TARGIT, Power BI, Excel, etc. As the Microsoft BI stack is still used widely around in small to large companies, the technology for it is mostly SQL Server Analysis Services (SSAS) or Microsoft Cubes. These are a compelling and fast way to aggregate and pre-calculate all of your corporate-related data and makes it available to your operational business users. Nevertheless, these cubes are getting more and more to their limits as you might have experienced that in your own company. There are several problems I see or encountered:

  • The multi-dimensional model of Analysis Services are not supported in the Azure Cloud and also does not look like it will soon (there is the tabular model of Microsoft cubes, but they still got some following limitations).
  • There is a limit to the size of data it can process fast. It was developed a long time ago it is not optimised for the vast amount of data nowadays.
  • Does not fit in the open-source and big data ecosystem due to:
    • It is not distributed and cannot be parallelised as any new modern technologies (containers, clusters, etc.).
    • It uses Microsoft Visual Studio to add or edit the cubes with is hard to source-control and make it impossible to modify for any other data-savvy person apart from the BI-Specialist.
  • The query language MDX is considerably hard to understand and difficult to write more complex queries.

Therefore I was researching for alternatives on the world wide web. But before I go into more dept what I found, I want that you look at the architecture as a whole. One reason is that nowadays are countless persons wish to make use of data, that’s why you might want to make it available to all of them or at least have a more open architecture. In my opinion, Data Lake is the buzzword for it where everyone easily can access data instead of BI-Specialists only or worse, infrastructure guys having access to the FTP-Server.

To illustrate this I like the architecture from DWBI1, which shows the architecture as generic as possible:

OLAP, what’s coming next? (2)

What you also see is the difference between corporate data that traditionally go into the Data Warehouse (DWH) and the real-time streaming data like social media, IoT that goes into the data lake. As Data Lakes makes it easier to access data for data-savvy persons, you should always keep in mind where you put which data. If you want to know more about that topic, read my other blog post about DataWarehouse vs Data Lake.

Back to the OLAP Cubes which would be on top of the Data Warehouse in the architecture above as In-Memory-Models or Semantic Layer. You understand OLAP is still used for essential tasks as Data Exploration, Analytics and Self-Service BI. I would say that Data Lakes are having not enough speed for any of these needs where speed and fast response time is critical in most organisations. On the other hand, Data Lakes are very beneficial for Data Scientists because they are interested in all the data without seconds query-response time. Therefore a Data Scientist can explore, analyse and run the machine learning models on top of the Data Lake very convenient.
But what is coming up next now after, e.g. Microsoft Analysis Services? In the following chapter, I show you the possible different technologies to replace it with a modern, scalable and fast replacements.

List of Cube-Replacements

Because you can make a one-to-one replacement, having a faster cloud-backend, virtualise a semantic layer or use a service cloud provider I categorised the different technologies in the following groups.

OLAP-Technologies

With OLAP-Technologies you replace your cubes one to one with another technology. Therefore you don’t change anything on your current architecture but replace your cubes with a modern big data optimised technology which focus on fastest query response time. See the Appendix for a comparison between modern OLAP Technologies.

Cloud Data Warehouses

Another approach is that you change your on-premise Data Warehouse to a Cloud Data Warehouse to get more scalability, more speed and better availability. This option is best suited for you if you do not necessarily need the fastest response time and you do not have tera- or petabytes of data. The idea is to speed up your DWH and skip the layer of cubes. This way you save much time in development, processing and maintaining of cubes. On the other hand, you lose in query latency while you create your dashboards. If you mainly have reports anyway, which can run beforehand, then this is perfect for you.

Data Virtualisations

You may have many source system from different technologies, but all of them are rather fast in response time, and you don’t run a lot of operational applications, you might consider Data Virtualisation. In that way, you don’t move and copy data around and pre-aggregate, but you have a semantic middle layer where you create your business models (like cubes), and only if you query this data virtualisation layer it queries the data source. If you use, e.g. Dremio, there you use Apache Arrow technology which will cache and optimise a lot in-memory for you that you have as well astonishing fast response times.

Serviced Cloud and Analytics

Last option is to buy a Service Cloud Storage or Analytics vendors like Looker, Sisense or Panoply. These are very easy to use and create implicit cubes for you, meaning you just join your data together in your semantic layer of the respective tool and all the rest is handled by the tool, including the reporting and dashboard tools. In this way your more depended on the individual vendor, it might also be more expensive (prices are always very in transparent and hard to get), but you are very fast up and running.

If you go one step further, let’s say you choose one of the above technologies, you will most probably run into the need to handle intermediate levels in between. For example to prepare, wrangle, clean, copy, etc. the data from one to another system or another format especially if your working with unstructured data as these need to be mingled in a structured way at the end in one or the other way. To keep the overview and handle all these challenging tasks, you need an Orchestrator and some cloud-computing frameworks which I will explain in the two following chapters to complete the full architecture.

Orchestrators

After you choose your group and even your technology you want to go for, you want to have an Orchestrator. This is one of the most critical tasks that gets forgotten most of the time.

What is a Orchestrator?

An orchestrator is a scheduling and monitor workflows tool. For the different technologies and different file format working together, you need some orchestrator and processing engine that prepares, moves and wrangle the data correctly and to the right place.

Why would you need this?

As companies grow, their workflows become more complex, comprising of many processes with intricate dependencies that require increased monitoring, troubleshooting, and maintenance. Without a clear sense of data lineage, accountability issues can arise, and operational metadata can be lost. This is where these tools come into play with their directed acyclic graphs (DAGs), data pipelines, and workflow managers.

Complex workflows can be represented through DAGs. DAGs are graphs where information must travel between the vertices in a specific direction, but there is no way for information to travel through the graph in a loop that circles back to the starting point. The building blocks of DAGs are data pipelines or following processes where the output from one process becomes the input for the next. Building these pipelines can be tricky, but luckily there are several open-source workflow managers available, allowing programmers to focus on individual tasks and dependencies:

Cluster-computing frameworks

  • Apache Spark (→ Databricks / Azure Databricks )
  • Apache Flink (main difference to Spark is that Flink was built from the ground up as a streaming product. Spark added Streaming onto their product later)
  • Dask (distributed Python with API compatibility for pandas, numpy and scikit-learn).

To complete the list, we also need to address the computing frameworks mostly know for example Spark. Spark or Cluster-computing frameworks are unified analytics engines for large-scale data processing which means you can wrangle, transform, clean, etc. your data at a large scale with a lot of parallelisation. This can also be used and started within the above-mentioned orchestrator-tools.

Conclusion

There is not only the right technology to choose it’s also critical to define your requirements and goals correctly that you want to achieve with your data ecosystem. Based on these definitions you are selecting the technology in the right category either a one-to-one OLAP replacement, a faster Cloud Data Warehouse backend, virtualised semantic layer or you use a Serviced Cloud Provider.

If you include the featured tools from the very beginning like the orchestrator or cloud-computing framework, then for sure you will have the perfect fit for your use-case that also survives the very near future.

Related Links:

Appendix

Comparison modern OLAP Technologies

Republished on LinkedIn and Medium.

Comment by Hugo Delgadinho on 2019-08-05 15:40

Thank you for your post, you gave an modern approach to replace Olap cubes but I think we can see this old features but with a modern vision, as explained in this article https://www.imaginarycloud.com/blog/oltp-vs-olap/ .

Comment by Wade on 2019-09-21 22:55

You left out tabular analysis services which is really good.

Comment by Simon on 2019-09-23 11:45

Hi Wade, thanks for commenting. But actually under Why replace OLAP-Cubes, I stated that Tabular Cubes has similar problems or limitations as the Multidimensional. Search for “Tabular” and you’ll find it :-).

OLAP, what’s coming next? (2024)

FAQs

Is OLAP still being used? ›

Is OLAP Obsolete? While OLAP cubes (or business intelligence cubes) are now unnecessary, it's important to note that OLAP workloads are in no way obsolete. OLAP itself enables the flexible multidimensional data analysis that leading organizations use every day.

What is replacing OLAP cubes? ›

In-memory analytics eliminates the need to store pre-calculated data in the form of OLAP cubes or aggregate tables. It offers business-users faster analysis, and access to analysis of large data sets, with minimal data management requirements.

What is OLAP good for? ›

Online analytical processing (OLAP) is software technology you can use to analyze business data from different points of view. Organizations collect and store data from multiple data sources, such as websites, applications, smart meters, and internal systems.

What are the five operations used in OLAP? ›

What are the five operations used in OLAP? The five types of OLAP operations are drill down, roll up, slice, dice, and pivot.

Does Netflix use OLAP? ›

Netflix uses the OLAP querying functionality of Druid to quickly slice data into regions, availability zones, and time windows to visualize it and gain insight into how the network is behaving and performing.

Does Snowflake use OLAP? ›

Snowflake is a fully managed platform with unique features that make it an ideal solution to support data processing and analysis. Snowflake uses OLAP as a foundational part of its database schema and acts as a single, governed, and immediately queryable source for your data.

Is there any alternative technology to OLAP? ›

Both MOLAP and HOLAP are flavors of OLAP. In the case of MOLAP, data is stored in the form of multidimensional cubes using proprietary formats. These cubes ensure fast data retrieval at the time of serving queries.

Do people still use OLAP cubes? ›

OLAP is far from dead. It remains relevant, even in the cloud era, storing data in multidimensional structures, providing semantic definitions, and taking on an important role in data analytics and management on the data lake.

What are the disadvantages of OLAP cube? ›

Some of the disadvantages of OLAP are pre-modeling, which is a must, great dependence on IT, poor computation capability, slow in reacting, short of Interactive analysis ability, abstract model, and great potential risk.

Why data mining is better than OLAP? ›

Data mining is often used to identify patterns and trends in data that would not be visible to the naked eye. This information can then be used to make predictions about future events, improve decision-making, or identify new opportunities. OLAP is used to answer specific questions about data.

When would a company use OLAP? ›

Common uses of OLAP include data mining and other business intelligence applications, complex analytical calculations, and predictive scenarios, as well as business reporting functions like financial analysis, budgeting, and forecast planning.

Which database is best for OLAP? ›

Some of the best databases for OLAP are:
  • Amazon Redshift: A cloud-based data warehouse with columnar storage, MPP architecture, and built-in features for OLAP.
  • Google BigQuery: A fully-managed cloud data warehouse with a columnar storage engine and MPP capabilities, designed for OLAP workloads.

What is an example of OLAP in real life? ›

Marketing: Industries like digital marketing, health care, eCommerce, and finance uses OLAP in their marketing. Example: Market Basket Analysis is a technique that gives the careful study of purchases done by a customer in a supermarket. This concept identifies the pattern of frequent purchase items by customers.

What are the 3 types of OLAP? ›

There are three main types of OLAP: MOLAP, HOLAP, and ROLAP. These categories are mainly distinguished by the data storage mode. For example, MOLAP is a multi-dimensional storage mode, while ROLAP is a relational mode of storage. HOLAP is a combination of multi-dimensional and relational elements.

What is the conclusion of OLAP? ›

OLAP enables users to search and filter data easily, facilitating quick and targeted analysis. OLAP serves as a foundation for business modeling, data mining, and performance reporting tools. OLAP empowers users to perform slice and dice operations, exploring data from different dimensions and perspectives.

Do we need OLAP? ›

Consider OLAP in the following scenarios: You need to execute complex analytical and ad hoc queries rapidly, without negatively affecting your OLTP systems. You want to provide business users with a simple way to generate reports from your data.

Which companies use OLAP? ›

Companies Currently Using Oracle OLAP
Company NameWebsiteSub Level Industry
U.S. Cellularuscellular.comTelephony & Wireless
VOISdvois.comSoftware Manufacturers
National General Insurancenationalgeneral.comInsurance
Port of Seattleportseattle.orgMarine Shipping & Transportation
2 more rows

Is OLAP better than OLTP? ›

When to use OLAP vs. OLTP. Online analytical processing (OLAP) and online transaction processing (OLTP) are two different data processing systems designed for different purposes. OLAP is optimized for complex data analysis and reporting, while OLTP is optimized for transactional processing and real-time updates.

Is OLAP necessary? ›

The main advantage of OLAP is the speed of query execution. A correctly designed cube usually processes a typical user query within 5 seconds. The data will always be right at your fingertips to refer to while there is a necessity to rapidly take an important decision.

Top Articles
Latest Posts
Article information

Author: Sen. Ignacio Ratke

Last Updated:

Views: 6082

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Sen. Ignacio Ratke

Birthday: 1999-05-27

Address: Apt. 171 8116 Bailey Via, Roberthaven, GA 58289

Phone: +2585395768220

Job: Lead Liaison

Hobby: Lockpicking, LARPing, Lego building, Lapidary, Macrame, Book restoration, Bodybuilding

Introduction: My name is Sen. Ignacio Ratke, I am a adventurous, zealous, outstanding, agreeable, precious, excited, gifted person who loves writing and wants to share my knowledge and understanding with you.