Kevinand I are pleased to welcome back Kirk Munroe for the second blog in his seriesabout data modeling with Tableau. Kirk lives in Halifax, Nova Scotia, Canadaand is a business analytics and performance management expert. He is currentlyone of the owners and principal consultants atPaintwith Data,a visual analytics consulting firm, along with his business partner and wife,Candice Munroe, a Tableau User Group Ambassador and former board member for Viz for Social Good.
Kirkis also the author ofData Modeling withTableau,an extensive guide, complete with step-by-step explanations of essentialconcepts, practical examples, and hands-on exercises. The book details the rolethat Tableau Prep Builder and Tableau Desktop each play in data modeling. Italso explores the components of Tableau Server and Cloud that make datamodeling more robust, secure, and performant. This blog will give you a tasteof what’s available in the book, so if you find this useful, go buy the book!Also be sure to check out Kirks’ previous blog, 4 Common Tableau Data Model Problems…and How to Fix Them.
Desktopvs Prep
When modelingdata in Tableau, one of the most common questions I hear is, “when do I useTableau Prep and when should I use Desktop?” In this post, I am going to do mybest to answer these questions in a pragmatic manner.
Why pragmatic?Data modeling often ends up in a very theoretical, almost dogmatic,conversation. In the context of Tableau, this often means that many of theitems that are discussed here would ideally be done earlier in the datapipeline. I acknowledge that many of these items might be in the vein of “oncethe horse is out of the barn” but, in my experience, data engineering teams areregularly overtaxed and often have strict development processes that result inweeks (or months) for change requests. Business users don’t want to wait thatlong. So, we are going to make three assumptions before diving into ourscenarios:
1)It’s In Your Hands - You, thereader, are tasked with coming up with an analysis for decision-makers in yourorganization and you don’t have the ability, funding, or time to go back to anearlier point in the data pipeline.
2)You have a Tableau Creator License (whichgives you access to both Desktop and Prep) and don’t have access to other dataprep tools. In other words, we are going to solve these use cases using onlyTableau products.
3)Your organization has Tableau PrepConductor on Tableau Server or Cloud. Prep Conductor is part of the Data Management Add-On and, without it, you cannot easilyschedule Prep workflows to refresh automatically. Without this, the choice ofPrep vs Desktop becomes more difficult as you can easily refresh data modelscreated in Desktop, but Prep would require you to either refresh them manuallyor to use unwieldy batch files.
We are going touse the language of Prep and Desktop for brevity. Prep represents the Tableau PrepBuilder client software or New Flow capabilityon the web. Desktop represents theTableau Desktop client software or the NewWorkbook and New Published DataSource capabilities on the web.
One otherimportant concept to always keep in mind is the trade-off between the cost of aquery and the user experience (UX) for your end users. What is meant by thecost of a query? The simplest way to think of this is how long the query takesto run. A “cheap” query runs quickly (e.g. subsecond) and an “expensive” querytakes longer to run. In the cloud computing era, this can often map exactly tothe monetary metaphor it represents if you are paying by server utilization.The concept is still important with on-premise database servers as a longerquery means longer queues for subsequent queries. We want to optimize ourqueries to be as quick (cheap) as possible but it isn’t always possible. Inthese cases, we want to pay the cost of these queries as infrequently aspossible.
What does allthis mean in the context of Prep vs Desktop? Prep will typically pay the costof the query once per output–commonly once a day. In the case of a liveconnection, Desktop will pay the cost of the query every time someone interactswith the viz. This means, the more expensive the query, the better suited it isfor Prep.
Of course, whenwe’re working with extracts, that’s not exactly the case. In our scenario,we’re most likely going to output our Prep flows to a hyper extract. You can,of course, do the same thing with Desktop. By doing this, we’re reducing thecost of the query to once, just like in Prep. Of course, an extract is, infact, a database, so Tableau is always executing queries against it. But, atthat point, the question of Prep vs Desktop becomes less about how many times aquery is run and more about the complexity of those queries, which we’ll touchon in these scenarios.
What about theend-user experience? In many situations, this is what you have to trade offa*gainst query cost. Let’s take a common use case that we won’t cover in thispost. In the DesigningEfficient Workbooks whitepaper (which is an amazingresource, so I highly recommend having it handy at all times!), the authorstalk about how expensive “only relevant values” filters are in Tableau. This isa trade-off I would make for user experience every time (assuming I couldn’tget rid of the quick filters on the dashboard altogether 😊). When abusiness user comes across a quick filter with values that, if selected, resultin viz with no data, the experience is frustrating and often appears to be abug. It can result in them not trusting or even using your viz. Worth the trade-off in performance! Of course, aslow-running query is not good for user experience either, so there’s adelicate balance for us to strike.
Scenarios
Now let’s jumpinto our scenarios. For each, I’ll provide one of the followingrecommendations:
UsePrep- Only Prep has the feature(s) needed.
LeanPrep- This is a case where it can be done in either but it is better done in Prep.If you’re already building a Prep flow for other purposes, then I’d highlyrecommend pushing this particular functionality into that flow.
Either-Both products work just fine so you’ll need to consider the cost of the queryand user experience, then pick the tool that makes the most sense for your usecase.
LeanDesktop- It can be done in either but it is better done in Desktop. If you’re notalready building a Prep flow for other purposes, then it’s probably not worthcreating one specifically for this scenario.
UseDesktop- Only Desktop has the feature(s) needed.
I’ve sorted thesescenarios from Use Prep to Use Desktop as that is the order wewould typically encounter them in the data pipeline.
1)Pivot Rows to Columns -Use Prep
Why:While you can sort of force Desktop to do something similar to this, the PivotRows to Columns feature only exists in Prep.
Whenmight you encounter it: When your data is too narrow asdescribed in 4 Common Tableau DataModel Problems…and How to Fix Them.
Exceptions:None. Trying to deal with this in Desktop results in bad UX and costly queries.
2)Cleaning Data -Use Prep
Why:This feature is somewhat limited in Desktop. Prep has robust machine learningalgorithms to clean and profile data as you work.
Whenmight you encounter it: When the source systems thatproduce your data don’t properly enforce text field input validation. Or, ingeneral, you’re working with some sort of messy data set. In other words, alot!
Exceptions:There are a few scenarios where, if you have only a little bit of data cleaningto do, you might be able to knock that out in Desktop. But, I’d recommendagainst that as data profiling is built into Prep, allowing you to catchcleanliness problems you would have never even noticed otherwise.
3)Date Scaffolding - Use Prep
Why:While possible to do this in Desktop, it’s very complicated. Prep, on the otherhand, includes easy-to-use native functionality for date scaffolding.
Whenmight you encounter it: When your data is too short asdescribed in 4 Common Tableau DataModel Problems…and How to Fix Them.
Exceptions:None
4)Row-Level Calculations -Lean Prep
Why:A row-level calculation is always an option in Prep as it has all rows and thecalculation is not dependent on the viz level-of-detail. If you do thesecalculations in Desktop, you might pay the price for the same query multipletimes. (As noted earlier, in the case of extracts, Desktop will materializethese calculations. In other words, it will only pay the cost of the query whenit rebuilds the extract, the same as Prep. However, it is not always easy topredict when Hyper materializes a calculation and when it doesn’t.)
Whenmight you encounter it: It’s pretty rare to not have anyneed for row-level calculations, so it’s likely you’ll have to deal with thisalmost every time.
Exceptions:If you aren’t otherwise using Prep to build your data model and you test inDesktop (or with Performance Recorder) and it runs quickly.
5)Integration with Data Science Models - Either
Why:If the data science model is intended to run once per refresh of the datamodel, use Prep. If it is being used interactively based on user input, useDesktop. For example, you might need to score a large amount of data forcustomer sentiment; in this case, since it’s being done once, you can use Prep.Another example is interactively deciding on a customer's propensity to churnbased on parameter inputs; in this case, the output is dependent upon userinteraction, so you must use Desktop.
Whenmight you encounter it: Any time you’re using an analyticsextension (R, Python, etc.) to process data.
Exceptions:NA
6)Filtering Data -Either
Why:Both Prep and Desktop make it easy to filter both rows and columns of data.When filtering is needed to create your data model (for example, maybe you onlyneed to show the last 3 years), then Prep is best. When filtering needs to beinteractive for your user, then you must use Desktop.
Whenmight you encounter it: Filters are used in almosteverything I create, so you’ll encounter this in most scenarios. When decidingwhether to use Prep or Desktop, you should focus on whether or not thosefilters need to be available to users.
Exceptions:N/A
7)Union Data -Either
Why:Both Prep and Desktop make it easy to union data. My preference is to do thisin Prep if possible as it will greatly simplify your resulting Tableau Desktopdata model and could help with query performance.
Whenmight you encounter it: Fairly Often. Especially whenlegacy systems dump data to flat files for ingestion into analytics tools likeTableau.
Exceptions:N/A
8)Joining Tables that Don’t Cause Aggregation Issues - Either
Why:Both Prep and Desktop make it easy to join tables. Most joins do not causeaggregation issues as you will be joining dimension tables to fact tables. Anexample would be a Superstore-like scenario in the “real world.” Superstore isalready a highly denormalized table. It would have likely been created byjoining a fact table of sales with a few dimension tables. For instance, interms of product analysis, the fact table would have only had Product ID in thefact (sales) table. This data would have been joined with a product table withfields for Category, Subcategory, and Product Name. This is a pretty commontype of join and does not cause aggregation issues as the result of the joinleaves the leave of aggregation the same as it was before the join–individualsales at the product level.
Whenmight you encounter it: Frequently. It’s very common toneed to join multiple tables together.
Exceptions:Not an exception, but this might have a slight “Lean Prep” because the Prep UImakes the results of your join immediately obvious in a visual manner. However,the nature of relationships might cause you to rethink this, as I’ll addresslater in this post.
9)Pivot Columns to Rows -Either
Why:Both Prep and Desktop make it easy to pivot columns to rows. As with unioning,I prefer pushing this to Prep in order to simplify your Desktop data model.Prep also has a little more functionality here, particularly the ability to usewildcards when pivoting.
Whenmight you encounter it: When your data is too wide asdescribed in 4 Common Tableau DataModel Problems…and How to Fix Them.
Exceptions:N/A
10)Aggregate Calculations -Use Desktop
Why:Aggregate calculations are typically dependent on the fields on the viz, sothey should be done in Desktop. For example, consider profitability. You cancreate a SUM(Profit)/SUM(Sales) calculation in Desktop then use it to find profitabilityat the region level, customer level, or any level you choose–all using a singlecalculation. In Prep, you’d have to know those levels of detail ahead of time,then create individual fields for each one.
Whenmight you encounter it: Aggregate calculations are verycommon and are typically used in most Tableau workbooks.
Exceptions:MIN and MAX calculations could make sense in Prep. Imagine that you want toknow when a customer first bought a product, using Superstore as an example.The calculation { FIXED [Customer Name] : MIN([Order Date])} is not dependenton the level of the viz. Why not ensure you only pay the cost of that queryonce and put it in Prep?
11)“Conditional” Calculations -Use Desktop
Why:These are calculations that are “conditional” depending on what is placed onthe view or based on some data input. Generally speaking, these tend to betable calculations (which will always compute differently based on thedimensions on the view) and some types of level-of-detail (LOD) calculations,particularly INCLUDE and EXCLUDE. In such cases, Desktop must be used.
Whenmight you encounter it: It’s somewhat often that you’llneed to use these types of calculations in your work.
Exceptions:In some simple cases where a calculation is conditional on a single value, Prepcould be used (particularly using a parameter). There are also other scenarioswhere the only option in Desktop is the use of a table calculation, but Prepcould be used to limit such a need. For a good example of such a scenario, seethe Get Data from a Prior Row/Partitionsection of Ken’s blog, 3Creative Uses for Relationships/Joins in Tableau.The blog shows how to address the problem using Desktop, but we think it wouldbe more effective to handle this upfront in Prep. The solution pushes the workto the data model, resulting in the elimination of any need for complicatedtable calculations.
12)Extending the Data Model (Sets, Folders, Hierarchies, Custom Formatting, etc.) - Desktop
Why:There is so much you can do with your data model in Desktop that you cannotcreate in Prep, including creating Sets and Hierarchies, organizing data intofolders, applying default formatting (and other settings) to data items, etc.
Whenmight you encounter it: All the time! This doesn’t meanPrep can’t be your go-to modeling tool. Just enhance your model after Prep runsthe flow. The good news is that all your changes are maintained when the flowis run in the future.
Exceptions:N/A
13)“Joining” Tables that Result in Aggregation Levels - Use Desktop
Why:This isn’t possible in Prep. It is done in Desktop with relationships and notjoins, hence the “joining” in the use case title. Let’s take the case ofgetting sales targets for each subcategory at the monthly level. Your consumersare looking to see how sales performed against targets. If you join these data,you are going to end up with issues because the sales data is at thetransactional level, but targets are at the monthly level. This means you willeither have to: (i) pre-aggregate our sales data before making the join (whichlimits details analysis of sales) or (ii) be really careful with theaggregation levels when creating sheets in Tableau. These data at differentlevels of aggregation are handled pretty seamlessly by relationships, whichwill we talk about shortly
Whenmight you encounter it: Somewhat regularly. While this isnot needed in all analyses, it is not an uncommon requirement. When it occurs,it can be difficult to determine the best approach.
Exceptions:N/A
14)Live Connections to Your Data -Desktop
Why:If you need a live connection to your data, then you cannot run it through aPrep flow ahead of time as Prep always outputs the data from flows into a newtable/files, which are essentially point-in-time snapshots.
Whenmight you encounter it: There are many use cases for liveconnections, especially with the emergence of high-performance cloud datawarehouses. However, it’s important to only use live connections when youreally need them because this can be a big performance hit.
Exceptions:There are no true exceptions as only Desktop can connect live but there may becases where you create flows in Prep, output to a database instead of apublished data source, and have Desktop connect live to that data. Or you maybe able to schedule Prep flows to run frequently enough to create a data sourcethat is near-real time enough to meet your users’ requirements.
TheImpact of Relationships
Before closingout this blog, I think it’s important to briefly touch on the impact ofrelationships, a feature added to Tableau Desktop in version 2020.2. Asdiscussed by Ken in his blog, Tableau'sNew Data Model, relationships provide a moreflexible way to join your data that is dependent upon the dimensions andmeasures used in your view. Essentially, relationships cause Tableau to performwhat Ken calls “Smart SQL”, only joining the tables that are needed to meet theneed of a specific view. When using a live connection, this can help to improvequery performance significantly by “culling” unneeded joins and tables. But,since extracts are also a type of database, it has a similar impact whenconnected to extracts.
Obviously, sincerelationships are largely viz-dependent, they don’t make sense in Tableau Prep.But their existence may cause you to reconsider how you build your data modelin Prep. Let’s say, for example, you have two tables that you want to combinetogether. One has 10 thousand rows and will be used in every worksheet youcreate in Tableau. The other has 10 million rows and will only be used on onesheet. You could join the tables in Prep, but your resulting data model willhave at least 10 million rows. Once you’re working with that data in TableauDesktop, it will need to process all those rows for all of your sheets. This, of course, is the beauty ofrelationships. But this does not mean that you should dispense with Prep.Instead, you should just rethink your Prep workflow slightly. Instead ofjoining these within the workflow, you can work with both separately. Clean upand extend each table in the flow, but output them to their own tables/files.Then, in Desktop, connect those outputs using relationships, so that it willcull the large table when it’s not in use.
Summary
There you haveit–14 Tableau data modeling scenarios and which Tableau product to use for eachof them. As a companion to the blog, I’ve created a Tableau Public workbook toact as a cheat sheet.
My book, DataModeling with Tableau, covers these and many other usecases and scenarios. In addition to modeling in Tableau Desktop and PrepBuilder, it covers virtual connections, securing and sharing data models,creating models in the Tableau web interfaces, data model considerations forAsk and Explain Data, and data catalog and lineage. It is now available onAmazon in e-book and print formats.
Thanks forreading. If you have any questions or thoughts, feel free to drop a commentbelow and we will be glad to address them.
Kirk Munroe
February 20, 2023