3: Building a Data Model to support self-service reporting
3.2 Introduce Dimensional Modeling
3: Building a Data Model to support self-service reporting
3.2 Introduce Dimensional Modeling - Video Tutorials & Practice Problems
Video duration:
14m
Play a video:
<v Instructor>In this sub-lesson</v> we're going to introduce dimensional modeling. We're going to define key terms and structures around dimensional modeling. We're going to discuss the benefits of a dimensional model and we're going to walk through a dimensional model. All right, so what exactly is dimensional modeling? Dimensional modeling is a methodology that centers around measuring business processes. And that's really what makes it in my mind so business friendly is its core center is really around a business process. So in our example here the business process is sales. Other examples of business processes may be inventory, payroll, time sheets, and things like that. So these are business processes that we see across many of our businesses. So the business processes are known as facts. So if you recall back in lesson number two when we went in and rearranged our queries in the Power Query Editor, we set up a group for facts. And that is where we're going to store the tables that contain the measurements of our business processes. So the things that we're trying to sum, average, min-max or count or basically do aggregations on. And when we measure business processes we need ways to describe those measurements. And the ways that we describe those measurements are by using dimension tables which serve as report labels or filters for our analysis. And the combination of facts and dimensions is what is known as a star schema or a dimensional model. So in our example, the business process is sales and the dimension tables we have are products, sales territories and order dates. So inside of the sales fact we are going to be measuring things like total sales, order quantity and maybe doing some analysis around unit prices. And the ways that we're going to go ahead and describe those measurements and aggregate them are by products, sales territories and order dates. So we're going to take a look at our sales by product, our sales by sales territories and our sales by order date. Now you'll notice in the last couple seconds I said the by word an awful lot. So whenever you're doing analysis and somebody says, "I want to understand something by," what usually comes after the by keyword is a dimension. And that is used to describe the business process that you're trying to measure. So what does this look like? So we've referred to fact tables, dimension tables, and this in aggregate is referred to as a dimensional model. And a dimensional model comes in two forms. One is what is referred to as the star schema where the fact table sits in the middle and one dimension table sits around the edges. So we've got a table and then the three-dimension tables that are important to our model, the territories, date and product. So this is what a star schema looks like. Alternatively, we can define things in the dimensional model as a snowflake schema. And that happens whenever we extend beyond one table from the fact table and we get into more parent-child relationships. So on the far right hand side we'll see we have our dim product table and then we have a separate table which contains our product category information. Now, if you would recall back in lesson number two, in the Power Query Editor we actually went through an exercise where we collapsed the product levels which contained our sub categories and categories into the product table itself. So we could have left that alone and actually built relationships or what we'll do here momentarily in this lesson is build the relationships up. But back in Power Query, we actually decided to go ahead and collapse our tables down into a star schema form. And that star schema form looked something like this where we took the product levels and merged it into the product table to form that single tables so that we only had one table out from the actual fact table itself. Okay, so we're going to go ahead and open up our Power BI Desktop file that supports this particular sub-lesson and show you an example of a star schema inside of Power BI and a snowflake schema. So let's go ahead and do that. So I'm going to go over to the files from my GitHub repository. And the first one I'm gonna open up here is this one called L0302 which is our star schema. So let's go ahead and open that Power BI Desktop file up first. Okay, so the desktop file is now open. So I'm going to go down to my model view here and here is the star schema. So in the middle we have our sales table. We have our order date dimension, our sales territory dimension, and our product dimension. Now I have purposely arranged these tables in a star schema form by putting the fact table in the middle and making the dimension table sit on the outside. Now for, at the end of the day the way these tables arrange doesn't really matter. But what I like to do when I'm training these concepts is to actually start mimicking the tables the way they would look in the actual star schema diagram. So that's the purpose of keeping things in this particular form here. So now let's go ahead and open up the snowflake schema Power BI Desktop. So I'm just going to go ahead and close this file out. And once again, I'm going to click on the one right above the L0302 snowflake schema. So I'll just go ahead and open that up and we'll wait for the desktop to open. Okay, so the desktop file is open. Once again, let's go down to the model view here so we can take a look at the tables. Now this one is arranged in a similar fashion, I've just got some things in different spots here. Once again, in the middle I have the fact table. So I've got our sales table. And around the edges I've got my order date and the sales territory. And then the product table is the one that was snowflaked. So I left the product table alone here. So we can see that we've got the key name prefix, standard cost, sub-category key, all within here. And then the way that we would roll those individual products up is using the product level. So if you recall back in lesson number two, we had actually gone through an exercise where we collapsed product level into product to formulate a star schema. But in this example here we've left it in snowflake schema format. Now there's some advantages to doing this. So some of the advantages are, once you get into larger models there are some performance benefits to snowflaking your tables out like this, but that's beyond the scope of this first class. One of the main reasons that we collapse things down to a star schema to start with is typically it makes it a lot easier for your end users to consume a model. So the theory here is the fewer the number of tables that you have for somebody to consume when they're building reports, that is just generally a better thing. So from an end user self-service perspective we would like to make our models contain the fewest number of tables possible. So we go ahead and collapse this down here. One of the other reasons that we will collapse things down into a single table as well is one of the rules in behind creating hierarchies which we will eventually do with product is that all of the fields that participate in the hierarchy have to exist in the same table. So if we did want to build a part product hierarchy the way this was set up, we would then have to go through some DAX expressions to bring the product category and subcategory fields into the actual product table later on so we could go ahead and build that hierarchy out. Okay, so one last thing that I wanted to show you was that maybe you have never heard the terms fact tables, dimension tables, star schemas, snowflake schemas, dimensional modeling, and all the terms we've mentioned so far. But if you've used tools like Excel for a long time you've already been doing analysis in a form that is in a lot of ways like a star schema or at least the analysis you produce through pivot tables mimics a star schema. So what I'm going to do is I'm actually going to show you an example of that by opening up a spreadsheet that we have provided for you in the GitHub repository that is called sales flattened for pivot tables. So let's go ahead and show you where that is. Okay, so I've got my directory open here with all the GitHub files. And if I just go up a couple of directories here and go into the data folder and click on additional data, you're going to see that there's an Excel spreadsheet in here called sales flattened for pivot table. So I'm going to go ahead and open that up and show you roughly what the data would look like if you had it inside an Excel spreadsheet and you were setting it up for consumption within a pivot table. So keeping in mind, this data we have here in the spreadsheet is very similar to the data we have in our star schema back in Power BI. Now I have removed a couple of columns just for simplicity here, but we can see we've got a sales order. We've got an order date which is part of the dimensions in our other model. We've got a product ID, product name, product category. I skipped out on the product sub-category. And then we've got some of the measures that we're trying to average, sum, min-max and count. So with this table here that is all flattened out and ready for consumption of a pivot table, I'm just going to quickly go ahead and build a pivot table on this and show you what this analysis looks like in a pivot table and how it loosely maps into a dimensional model. Okay, so this isn't really part of this course here to show how to build pivot tables but let's show you this anyhow because many of you taking this course are likely familiar with creating pivot tables. So inside of this Excel table here I'm going to go to the insert tab, click on pivot table. And I'm going to select the default here and just say, we're going to build a pivot table from table number one. And we'll go ahead and click okay. And we're going to see here that I have a pivot table in front of me here. I'm just going to go ahead and get rid of my grid lines. And now what I have over in the bottom right hand side of the screen here is filters, rows, columns, and values. So anything that sits over in the values bucket over here, these things right here are essentially what would come out of a fact table. So the things we're trying to average, sum, min-max and count would show up over here in the values bucket. We often in the rows and columns, we'll put the things that describe the measurements over here. And these are equivalent to the dimension tables that we build out inside of the Power BI Desktop. And also we can have filters in here and keeping in mind, I've described dimensions as either being report labels or filters. So that is what we have around here. So let's just really quickly go ahead and arrange this data into the pivot table buckets here and show you what that would look like. So I'm going to go ahead and drag my order date down into rows. I am going to go and maybe grab my, where's my products here. So I'm going to find the product category here, put that in columns. And maybe what I'm going to do here is take my line total and put it down here into values. So we can see this pivot table has been built out on the screen. And really what I have here is the things I'm aggregating down here in the bottom. This is roughly the fact table. And the dimension tables are around here and they are forming the labels and filters for our report. Maybe I even want to take this one step further and go here and maybe I'll add a, you know, store full name here in the filter, something like that. Just to show you that we have the ability to do that. And we can go in here and maybe choose one of the actual stores in our model to filter this data a little bit more. Either way, the columns, filters, rows, these are mapping into our dimension tables and what shows up down here is in our fact table. Now that is despite the fact that this data is arranged in a flattened Excel table that is meant to be consumed by a pivot table. Inside of Power BI we steer away from flattening data files out like we would if we were doing things for a pivot table. We leave things in their native state. We build fact tables. We build our dimension tables like we've done so far in the Power Query Editor. And Power BI is able to consume those models very quickly, very efficiently. And you'll find that their business users really can start digesting dimensional models quite easy because they're very similar to what they've seen in the past if you're able to draw the analogies and comparisons between the two. Okay, so we're going to move ourselves back into the Power BI Desktop now. And now back on the desktop I actually went into the model view and here is our star schema again. So once again, we have sales right here and the values in here will actually show up in the values bucket within the pivot table and our order date, sales territory and product, these will be used for our rows and columns and filters if we were actually going and building pivot tables, but we're not. We're getting our data set up for consumption in Power BI and trying to build things optimally. Okay, so that is it on this particular sub-lesson. This is just a very brief introduction into dimensional modeling and star schemas. You'll see how we're going to build this step as we move through the rest of the lessons here. And as you progress further and further down your journey in Power BI, you're going to want to become more and more familiar with dimensional modeling and the techniques around it because that's how you're going to build models that are predictable and perform in Power BI over the course of time.