2.14 Unpivot data - Video Tutorials & Practice Problems
Video duration:
13m
Play a video:
<v Instructor>Occasionally, we get data from people</v> that isn't structured into traditional rows and columns. Sometimes when we ask for information, it comes in pivoted form. As you can see on the screen here, I have somebody's budgeted product sales by year and month in spreadsheet form. And they've already got all the data pivoted out. So that we have, just to show you here at the top of the screen. We have 2019 which should go across all these values, here's the months. And then each one of these intersection points is the budget for that particular month within 2019. So we can see for accessories 2019, January, we have no budget. For accessories February, 2019 we have no budget as well. And for the rest of the year. Aside from December. So December, we can see has a accessories budget for December, 2019 of 106. And the 2019 total is 106 as well. What we're going to do to take this particular dataset and get it put in traditional rows and columns is use what is called the unpivot functionality that is available in Power Query Editor. And what that is going to do is basically what I have just described when I was showing each one of these intersection points. Is take each one of these cells here inside of the Excel spreadsheet. My guess at this point, it's an Excel, is it takes each one of these cells here and each cell will turn into a row with the values of accessories no value a month of January in 2019. Accessories, no value for February, 2019. And if I go down to bikes, we have values of bikes, $252 being the value, January being the month and 2019 being the year. So what it amounts to is each one of these intersection points right here. So every cell inside of this particular table will end up as a different row. And then we're gonna have to go through and do some cleaning of that dataset to actually get this into rows and columns, plus a clean set of rows and columns. So we only have values for all of this right here. And we can clean out all these totals that are sitting here and make sure that we repeat these values all the way across. So let's go ahead and do that. So back in my spreadsheet that I'm using to consume data into, I'm going to go to the get data functionality. I am going to go over to this file and choose workbook and go ahead and find where the workbook happens to be. So I navigate to this particular location right here. And I'm choosing this file here called pivot examples, I'm gonna click on import and let the window go ahead and take a look. And here I can see I've got my sheet called budget data. So instead of pulling out of a table this time, I'm actually pulling it off of the sheet. And if I scroll through here, I can see the, yeah, this looks like the data I had back in that spreadsheet 'cause I can see a year and then the months and the categories repeating themselves here, along with the grand total and then all the different cell values right here. So what we're gonna do is go ahead and use the unpivot functionality plus a few other cleansing features to go ahead and get this into a cleansed set of columns and rows. To do this, we will click transform data 'cause we want to get into the power creator to do some cleansing. We'll let the dataset go ahead and pop up. And the first thing we wanna do here is we wanna take the header information right here we wanna push it down into a row value. To do that, we are going to go to the transform tab. We're gonna go to the table grouping here and where it says use first row as header we're gonna click on this and say, use headers as first row. So we're gonna go ahead and push this down one level like so. And the next thing we're gonna do here is we wanna get rid of the rows that are blank. So we have nulls across all of these columns here. So we're gonna use a function that we use before. We're going to go up into our home tab. We're gonna go into remove rows and we're gonna say remove blank rows. So go ahead and do that. And that got rid of all of our blank rows. Next what I want to do, is I actually wanna get rid of this very first row. So keep in mind, this is the one I pushed out of the headers a moment ago. And now I wanna get rid of that. So I'm gonna go remove rows and say remove top rows. And I'm gonna go ahead and just choose my top one row like so. Go ahead and click okay. And at this point, I'm ready to go and transpose this table of data. I do that by going to the transform tab up here and I clicked transpose. And what that's gonna do is flip the dataset around. So I'm just basically going to twist the dataset. And I end up with something along these lines here. Next what I'm going to do is I'm gonna push this row right here into my header. So I'm gonna click on the use first rows headers like so click that and it will move those up into the headers. And what I wanna do now is I want to go and change the data type of column number one. So right now, this data type, is the any data type because Power Query is not really too sure what to do with this data type. We want to go ahead and turn this into a text data type. And the reason we wanna do this is let me just show you before we go ahead and do it is if I click this and want to do some filtering of data, my filtering is very limited when power query does not know the data type, I can have an equals and does not equal. But what you'll notice here in a moment here is if I go over here and just cancel that out and click on that and choose the data type to be text, and now go over to my dropdown box filter. I have a whole range of different filtering that I can do to that data there. But before I do anything with that, let's go ahead and take advantage of the fill function 'cause we know that we want this 2019 to repeat itself down across all of these rows here. So wherever these nulls are, we know that this is January, 2019, this is February, 2019, March, 2019. When we took a look at that on the original source spreadsheet, it was implied by the way that the report was set up, that 2019 was to repeat itself across all of the particular columns back at source. But now we wanna actually make it repeat through the rows. So what we're gonna do here is we're going to go to the transform tab under the any column grouping, go to fill and we're gonna say, fill down. And we're gonna notice now is that 2019 repeats itself all the way till it hits the first valid value and 2020 is gonna repeat itself all the way down as well. So the next piece of cleansing we can do is we don't want these columns in here to have the word total (indistinct), 'cause we can go ahead and compute those totals later on. We don't actually want these in our data set. They can be a confusing piece of things. So what we're going to do is we will click on the dropdown box here and we're going to go to the text filters and we're not gonna go over here to text filters and say does not contain. And what we wanna do here is we're going to type in total and we're gonna do this in the correct case because the power query language, the M language is case sensitive by default. If I go ahead and/or say does not contain total click, okay, we're gonna notice now that all of those total rows disappeared out of our dataset. So our dataset is now actually starting to look pretty good. The next thing we wanna do here is the column here that is now called column of one or has been called column number one for a little bit here. Let's go ahead and rename this. We wanna go ahead and call this year. So we're just going to double click on this and see year, oops, let's spell it properly. And over in here, we're going to call this one month like so. And after that renaming is done, what we can do here is we are going to go and twist this dataset so that we end up with what are called attribute value pairs. So each one of these intersection points right here. So this null in accessories will be a row. This will be its own row, its own row, its own row. So right here, this first row here is gonna generate four rows. And what we'll end up doing is generating four rows by 24. So that is going to be just under 100 rows, doing my math really quickly here, approximating things. And that's going to turn these all into a number of rows. So what we're gonna do is turn and click on the year and click on the month columns. And what I'm gonna do is go up into my transform tab here. And what I'm gonna do here is go over to the any column group once again and click on unpivot columns. And what I'm gonna do here is click this unpivot other columns. And what it's gonna do now is take our dataset and twist it into attribute and value pairs. So keep in mind here that I mentioned that we would end up with a number of rows for each one of the values we had back in the previous dataset. So let's go ahead and take a look at what that looks like here in a sec. So we can see the dataset has been pivoted, but what seems unusual if we take a look in the bottom part of the screen here, is it says, there are four columns and 74 rows. And we said just a moment ago that we should have just under 100 rows because if we go back a step here and we see that we had four different columns here and there's 24 rows that should take us to 96 different rows that should come across after we did this unpivot. And we only ended up with 74. What ended up happening is each one of these values here. If I just go back to my previous step, that had a null value in it is discarded from the dataset. So if we count these, we would see that we had 22 new rows will be formed that had a null value in it. And those would get excluded from the dataset. So if we go back over here, I have the 74 rows and that's basically the explanation in behind that. So the next thing we can do here now is go ahead and we can see these rows here that say grand total in them. So what I wanna do is I'm gonna go ahead and filter those out. So I'm just gonna click this drop down box and say text filters. And once again, I'm gonna go in here and say, does not contain, let's go here and type grand total. Like so. Go ahead and click okay. And now the grand total have been removed out of the dataset. The last thing we need to do here now is let's give these attribute value pairs, a proper name, and this right here is called product category. So we'll just go product and type category like, so. And over here in this other column, we want to actually call this, so call this budget amount because this was the budget amount. So budget amount like so, and now we have essentially taken our dataset that was in a report form and actually twisted it into traditional rows and columns. The last change I would probably wanna make here is this budget amount instead of having it be a whole numbers, let's go ahead and put this into a currency value like so, so that we have our values looking like that. All the other data types look just fine. All right, so that is a pretty quick look through how to take a dataset and unpivot it from a pivoted form. And even though that seemed like an awful lot of steps and may have seem a little strange in terms of how we actually, how we did that, this pattern will repeat itself over and over again, as you start doing unpivoting datasets like this. So you're gonna find that the way that you do this is similar across datasets that look like this.