SSAS: Taking the Tabular Journey

 


Introduction

A Getting-Started and Survival Guide for planning, designing and building Tabular Semantic Models with Microsoft SQL Server 2012 Analysis Services.

This article will be unique in that it will be a living document that will be updated and expanded over time. I will also post-as-I-go on the site about other things but this particular post will live for a while. I have a lot of good intentions – I know that about myself and I also know that the best way to get something done is to get it started – especially if I’m too busy with work and projects. If it’s important, the “completing” part can happen later. In the case of this post, I’ll take care of building it as I go, topic by topic. Heck, maybe it will never be “finished” but then are we ever really done with IT business solutions? I have been intending to get started on this topic for quite some time but in my very busy project schedule lately, didn’t have a concise message for a post – but I do have a lot to say about creating and using tabular models.

We've added some place-holder topic headers for some things that are on my mind. This list is inspired by a lot of the questions my consulting customers, students, IT staff members and business users have asked me on a regular basis. This will motivate me to come back and finish them and for you to come back and read them. I hope that you will post comments about your burning questions, issues and ideas for related topics to cover in this living post about tabular model design practices and recommendations.

 

Back to top


Why Tabular?

SQL Server Analysis Services is a solid and mature platform that now serves as the foundation for two different implementations. Multidimensional models are especially suited for large volumes of dimensionally-structured data that have additive measure values that sum-up along related dimensional attributes & hierarchies. 

By design, tabular architecture is more flexible than multidimensional in a number of scenarios. Tabular also works well with dimensional data structures but also works well in cases where the structure of the data doesn’t resemble a traditional star or snowflake of fact and dimension tables. When I started using PowerPivot and tabular SSAS projects, I insisted on transforming data into star schemas like I’ve always done before building a cube. In many cases, I still do because it’s easier to design a predictable model that performs well and is easy for users to navigate. A dimensional model has order and disciple however, the data is not always shaped this way and it can take a lot of effort to force it into that structure.

Tabular is fast for not only additive, hierarchal structured data but in many cases, it works well with normalized and flattened data as long as all the data fits into memory and the model is designed to support simple relationships and calculations that take advantage of the function engine and VertiPaq compression and query engine. It’s actually pretty easy to make tabular do silly, inefficient things but it’s also not very hard to make it work really well, either.

James Serra has done a nice job of summarizing the differences between the two choices and highlighted the strengths and comparative weaknesses of each in his April 4 blog post titled SQL Server 2012: Multidimensional vs Tabular. James points out that tabular models can be faster and easier to design and deploy, and that they concisely perform well without giving them a lot of extra attention for tuning and optimization. Honestly, there isn’t that much to maintain and a lot of the tricks we use to make cubes perform better (like measure group partitioning, aggregation design, strategic aggregation storage, usage-base optimization, proactive caching and cache-warming queries) are simply unnecessary. Most of these options don’t really exist in the tabular world. We do have partitions in tabular models but they’re really just for ease of design.

 

Back to top


What About Multidimensional – Will Tabular Replace It?

The fact is the multidimensional databases (which most casual SSAS users refer to as “cubes”) will be supported for years to come. The base architecture for SSAS OLAP/UDM/Multidimensional is about 13 years old since Microsoft originally acquired a product code base from Panorama and then went on to enhance and then rewrite the engine over the years as it has matured. In the view of many industry professionals, this is still the more complete and feature-rich product.

Both multi and tabular have some strengths and weaknesses today and one is not clearly superior to the other. In many cases, tabular performs better and models are more simple to design and use but the platform is lacking equivalent commands and advanced capabilities. In the near future, the tabular product may inherit all of the features of its predecessor and the choice may become more clear; or, perhaps a hybrid product will emerge.

 

Back to top


Isn’t a Tabular Model Just Another Name for a Cube?

No. …um, Yes. …well, sort of. Here’s the thing: The term “cube” has become a defacto term used by many to describe the general concept of a semantic model. Technically, the term “cube” defines a multidimensional structure that stores data in hierarchies of multi-level attributes and pre-calculated aggregate measure values at the intersect points between all those dimensions and at strategic points between many of the level members in-between. It’s a cool concept and an an even cooler technology but most people who aren’t close to this product don’t understand all that. Users just know that it works somehow but they’re often confused by some of the fine points… like the difference between hierarchies and levels. One has an All member and one doesn’t but they both have all the other members. It makes sense when you understand the architecture but it’s just weird behavior for those who don’t.

Since the tabular semantic model is actually Analysis Services with a single definition of object metadata, certain client tools will continue to treat the model as a cube, even though it technically isn’t. A tabular Analysis Services database contains some tables that serve the same purpose as measure groups in multidimensional semantic models. The rest of the tables are exposed as dimensions in the same way that cube dimensions exists in multidimensional. If a table in a tabular model includes both measures and attribute fields, in certain client tools like Excel, it will show up twice in the model; once as a measure group table and once as a dimension table.

(more to come)

 

Back to top


Tabular Model Design: The Good, the Bad, the Ugly & the Beautiful

As is typical for a newer product, the model designer usability isn’t perfect and there’s a lot to consider before trading up from a technology that’s been around for a long time.

 This posts summarizes all that is good, not so good and beautiful about the next generation of SSAS tabular; in-memory, BI semantic models.

 

Back to top


Preparing Data for a Tabular Model

I’ve taught a few PowerPivot training sessions to groups of business users (now, remember that Tabular SSAS is really just the scaled-up version of PowerPivot.) Admittedly I’m more accustomed to working with IT professionals and when I teach or work with users, I have to throttle my tendency to go deep and talk about technical concepts. In these classes, I find myself restating the same things I’ve heard in conference presentations and marketing demos about PowerPivot data sources, like “you can import just about anything into PowerPivot”. As I read the bullet points and articulate the points on the presentation slides to these users, I have this nagging voice in the back of my mind. I’ve spent many years of my career unraveling the monstrosities that users have created in Access, Excel & Visual Basic.

 Whether stated or implied, there is a common belief that a PowerPivot solution doesn’t require the same level of effort to transform, prepare and cleanse data before it gets imported into a data model. For many years, we’ve been telling these users that it will take a serious effort, at significant cost, to prepare and transform data before we can put it into a data mart or cube for their consumption. In a typical BI solution, we usually burn 70-80% of our resource hours and budget on the ETL portion of the project. Now, using the same data sources, users are being told that they can do the same thing themselves using PowerPivot!

Data Modeling 101 for Tabular Models

One of the things that I really enjoy about building tabular models is that I can have my data in multiple structures and it still works. If the data is in a traditional BI “Kimball-style” Star schema, it works really well. If the data is normalized as it would be in a typical transactional-style database, it still works. Even if I have tables that are of a hybrid design; with some characteristics of both normalized and dimensional models, it all works beautifully.

Here’s the catch; one of the reasons we build dimensional data model is because they are simple and predictable. It’s really easy to get lost in a complex data structure and when you start combining data form multiple source systems, that’s where you’re likely to end up. Getting business data into a structure that is intuitive, that behaves correctly and gives reliable results can be a lot of work so be cautious. Just because a tabular model can work with different data structures doesn’t that you don’t need to prepare your data, clean it up and organize it before building the semantic model.

The classic star schema is one of the most effective ways to organize data for analysis. Rather than organizing all data elements into separate tables according to the rules of normal form, we consolidate all the measures that are related to common dimensional attributes and with a common grain (or aggregation level), into a fact table. The dimensional attributes are stored in separate dimension tables – one table per unique business entity, along with related attributes. Any group of measures not related to the same set of dimensions at the same level would be stored in their own fact table. In the example, Invoice measures that are related to stores and customers, recorded every quarter are in one fact table. The sales debit records for customers and stores that are recorded daily go in a different fact table. The account adjustments don’t record the store key but they are uniquely related to accounting ledger entries stored in the ledger table. Note the direction of the arrows showing that facts are related to lookup values in the dimension tables.

http://sqlserverbiblog.files.wordpress.com/2013/07/image5.png 

Exhibit 1 – A Fully conformed Star Schema

What do you think?

Fill in the topics below or add your own section

If you can pound your data into the shape or a star schema and this meets your requirements; this is what I usually recommend. It’s a simple and predictable method to organize data in a well-defined structure. Now, let’s look a variation of this approach that has characteristics of both the star schema and normalized form. We’ll call this a “hybrid” model.

The following hybrid schema contains two fact tables in a master/detail relationship. The cardinality of the Invoice and LineItem tables is one-to-many where one invoice can have multiple line items. This would be considered a normalized relationship with the InvoiceID primary key related to the an InvoiceID foreign key in the LineItem table.

The Invoice table contains a numeric measure called Invoice Amount that can be aggregated by different dimensional attributes. Those attributes, such as Store Name, Customer Name or any of the calendar date units in the Dates table that are organized into a natural hierarchy (with levels Year, Month and Date). To facilitate this, the invoice table is related to three different dimension tables: Stores, Customers and Dates. Each of the dimension tables has a primary key related to corresponding foreign keys in the fact table. The LineItem table also numeric measures and is related to the Products table, also a dimension table.

http://sqlserverbiblog.files.wordpress.com/2013/07/image6.png

Exhibit 2 – A Hybrid Star / Master-Detail Schema

This semantic model supports two levels of aggregation with respect to the Invoice and LineItem records. If I were to browse this model in an Excel Pivot Table and put all the stores on rows, I could aggregate the Invoice Amount and see the sum of all Invoice Amount values for each store

<< need pivot table graphic here >>

<< Stubbed -- need content below >>

 

Back to top


Are There Rules for Tabular Model Design?

Oh, absolutely. Tabular SSAS and PowerPivot allow you to work with data is a variety of formats – structured & unstructured, dimensional & normalized. You have a lot of flexibility but there are rules that govern the behavior and characteristics of data.  If you don’t follow the rules, your data may not meet your requirements in the most cost-effective way. 

This reminds me of an experience when I started high school. In years past, all of the classes throughout the day started and ended on the same schedule.  The bell would ring and then every student in the entire school would flood the hallways and hurry to the next class. It was chaotic and in that 5 minutes, social order was reestablished; The Seniors would trample over the Freshmen; The football players would stuff the math nerds into lockers and trashcans, and friends could meet up and skip classes they didn't like.   As soon as the bell range again, classes started and the hallways were empty. Anyone in the halls was either skipping class on their way to the office with a detention slip or a bloody nose. This is the way it had always been done. It was a system that worked and everyone knew what to expect, despite the challenges and inflexibility.

The year before I started high school, the school district decided to pilot a new program called "modular scheduling". This meant that there were several different schedules with classes starting and ending at different times with the goal to even out the hallway traffic and reduce stress. For a few very disciplined kids, the program worked and academics improved. The atmosphere was more like a college than a high school. For the rest of us, it was a nightmare. Many students wandered the halls, lost with their printed schedules in hand, trying to figure out when and where their next class was. If a student needed to change their schedule, it was impossible because the class schedules would rarely align. However, the hallways were always peaceful and students acted much differently toward each other. The pilot program was dropped after two years and we went back to being a normal, crazy American high school where everyone know how things worked.

My point to this story is that well-established practices are predictable and work because we are able to plan and design around them. PowerPivot and, by virtue of it being based on that same underlying architecture, Tabular SSAS allow - if not encourage - designers to bend and break traditional rules of solution design by condensing data cleansing and governance rules into a few simple clicks in a wizard dialog.  Whether using these simple shortcuts or more traditional approaches, some of the laws of nature that govern the flow and behavior of data must be followed.

Rule #1: Model the data source

It is true that a tabular model can be used to normalize or dimensionalize complex and disjointed data from multiple sources or disparate tables, but this work is often best performed in a relational database before attempting to do all the transformation work in a relational database with more sophisticated tools. Using SSIS packages or even just views and SQL queries, transform records into subject-area dimension tables with related attributes. the Kimball dimensional modeling methodology is still king.  If done correctly, the semantic tabular model simply sits on top of the dimensional model, with one-to-one mappings between model tables and dimensional database tables.

Rule #2: Cleanse data at the source

Data quality issues; like blank, duplicate or unknown values are very common and should be dealt with before the data gets into the model. It's almost inevitable that you will encounter data quality issues after data is imported and to work around them temporarily in the model design. However, it's critically important to track these issues and clean up the data in the scripts or ETL packages that populate data marts and other data sources.

Rule #3: Simplify object names

It's common for application databases to contain abbreviated, condensed and cryptic column and table names. Table and column names should be changed to friendly names in the model so users are presented with plain and familiar objects. In simple models, you can rename objects as you go. However, this becomes increasingly difficult to manage in large models with complex designs with high volumes of data. In a large-scale semantic model, having several tables and a lot of columns, use views to simply tables and provide friendly column names. When changes to the tables or views, view the Table Properties under the Table menu to refresh column names and add the additional columns.

Rule #4: Delete or hide what you don't need

The reason that tabular models are fast is because data is compressed into a lean, compact in-memory structure by the xVelocity in-memory storage engine. The xVelocity (formerly called VertiPaq) engine woks best with narrow tables of few columns using conservative column sizes, sparse and redundant data values. Leaving unneeded columns in a tables bloats the model and slows down queries. Columns can always be added to a model so when in doubt, err on the side of leaving unnecessary data and objects out of your design and add essential objects later, if needed. Some columns are used internally for identifying records, for sorting values or for use in calculations. As a rule, hide every column or table that users don't need to see when they browse the model in a client tool.

Rule #5: Keep relationships simple

The standard practices of both star-schema and normalized schema design will work in Tabular; in some ways, better. However, the Tabular architecture only works with simple relationships and may not be able optimize more complex or less-traditional table schemas. Since only single-key relationships are currently supported, tables must be related using one key value in each of the two tables.  The key in one table must define only unique values and the other can a have multiple row values (a one-to-many relationship).  Relationship cardinality is not specifically defined and the designer will determine the "one" and "many" sides simply based on the existence of unique or non-unique row values in a column.

 

Back to top


Tabular Model Design Checklist

Model design is very much an iterative process with several properties and settings that need to be set as you step through the design of each table, field and measure. There are several properties that can be set to provide client tool behaviors and enhancements but its bets to make sure the essentials are covered first. The following are the most critical settings that should be set in every model:

  • Give every table and all fields friendly names that are familiar and intuitive for users. These may vary by application but I encourage mixed-case titles with spaces and no punctuation.
  • For each lookup or dimension table that uses a primary key to define the uniqueness of each row, set the Row Identifier property to True for the key column.
  • For Date lookup tables, use the Table menu to set Mark As Date Table.
  • For the numeric columns in fact tables, hide all of the base numeric columns and then create a measure for each of these columns using the auto-sum menu feature.
  • Rename the auto-sum measures using friendly names. For example, if the base column name is sales_total, hide this column and then create a measure named Sales Total using the expression Sales Total:=SUM([sales_total])
  • For each measure, set the Format property. Also, where appropriate, set the Show Thousand Separator and Decimal Places properties.
  • For fields that don't naturally sort correctly (like Month names), set the SortByColumn property to a field that will sort correctly.
  • For all numeric fields that should not be aggregated (like Year), set the Summarize By property to Do not summarize.
  • Hide all columns and tables that you don't want to display to users.

Melissa Coates provides a handy list of these and some other recommended settings in her post titled Checklist for Knowing Your PowerPivot Model is Complete 

 

Back to top


What’s the Difference Between Calculated Columns & Measures?

Both use DAX expressions. Calculated columns are calculated for every row in a table and will appear as a separate value on each row. Calculated measures return a single value per table or row group in a pivot table or report group. Measures return an aggregated value within the context of a related row, group or filter (known as the row or filter context).

 

Back to top


What are the Naming Conventions for Tabular Model Objects?

 

Back to top


What’s the Difference Between PowerPivot and Tabular Models?

 

Back to top


How to Promote a Business-created PowerPivot Model to an IT-managed SSAS Tabular Model

 

Back to top


Getting Started with DAX Calculations 

 

Back to top


DAX: Essential Concepts

 

Back to top


DAX: Some of the Most Useful Functions

 

Back to top


DAX: Some of the Most Interesting Functions

 

Back to top


Using DAX to Solve real-World Business Scenarios

 

Back to top


Do I Write MDX or DAX Queries to Report on Tabular Data?

 

Back to top


Can I Use Reporting Services with Tabular & PowerPivot Models?

 

Back to top


Do We Need to Have SharePoint to Use Tabular Models?

 

Back to top


What Do You Teach Non-technical Business Users About PowerPivot and Tabular Models?

 

Back to top


What’s the Best IT Tool for Reporting on Tabular Models?

 

Back to top


What’s the Best Business User Tool for Browsing & Analyzing Business Data with Tabular Models?

 

Back to top


Survival Tips for Using the Tabular Model Design Environment

 

Back to top


How Do You Design a Tabular Model for a Large Volume of Data?

 

Back to top


How Do You Secure a Tabular Model?

 

Back to top


How to Deploy and Manage a Tabular Model SSAS Database

 

Back to top


Tabular Model Common Errors and Remedies

 

Back to top


Tabular Model, Workspace and Database Recovery Techniques

 

Back to top


Scripting Tabular Model Measures

 

Back to top


Simplifying and Automating Tabular Model Design Tasks

 

Back to top


Tuning and Optimizing a Tabular Model

How do you tune a tabular model? You don’t.

You can prevent performance and usability problems through proper design.

Back to top