2: Importing and transforming data using Power Query Editor
2.8 Reduce rows
2: Importing and transforming data using Power Query Editor
2.8 Reduce rows - Video Tutorials & Practice Problems
Video duration:
6m
Play a video:
<v Instructor>In this sub lesson,</v> we're gonna simply talk about reducing the number of rows that are brought from a source system into the Power Query Editor and then eventually loaded into the data model. So for example, in this query we have here, we have five columns and seven rows. Column number one has several different values for the rows. We can see that the rows have the value of A, B and C. Let's just suppose that in column number one, we want to get rid of all the rows that have a letter A in it. We could then take advantage of the reducing rows capabilities in Power Query to go ahead and remove those rows from the query so they don't get loaded into the dataset. Now, this is one of the best practices we wanna adhere to is to only bring the rows of data into Power Query into the data model and it has to pass through the Power Query Editor first, only bring what you need and reduce out at the Power Query level the data that you don't. So let's go ahead and get ourselves into our sample Power BI file for this particular sub lesson. Okay, so I have my GitHub files open. I'm just going to double click on the one here called SalesDataMart SL 02 08 Start, go ahead and open that power BI desktop file open. Okay, so now that the power BI desktop files open we're gonna get ourselves into the Power Query Editor by your usual means by clicking the transformed data command will be brought into the Power Query Editor and the query we're gonna look at reducing rows from, is this order date query. But before we go ahead and get rid of the rows that we want let's just quickly go through the functionality here that is available in the Power Query Editor for removing rows. So if we are on the home tab and move over to roughly the middle of the screen we can see this group of functions here called reduce rows. So there's a number of different ways that we can go ahead and reduce the rows. So if I click the dropdown box next to this remove rows here, we can see that there is a way to remove the top rows from the data set. We can maybe remove the bottom number rows, remove alternating rows, get rid of duplicates. We can get rid of blank rows in our data sets and we can get rid of any rows that are errors. We also have the ability in this command here called keep rows, to basically just do the opposite. To say that we wanna keep a certain number of top rows, keep a number of rows from the bottom, keep a range of rows, maybe wanna keep the duplicates or keep errors. So we have a couple different commands in here to go ahead and reduce our rows. So, let's go ahead and get ready to reduce the data in our order date dataset. And we're going to do that by going to the date column here and we can click this drop down box right next to it. And here we can see that we're present with a number of other ways to filter our data. So if I want, I can go down to these date filters right here, and I can go in and say I wanna take a look at the rows or keep the rows that are after a certain date. So I'm gonna go here and click on after, let the dialog box pop up. And now I can see that the dialog box is telling me I wanna keep rows where the order date is after a certain date. So maybe I only want to keep rows in my data set that are after on 2017-01-01. So after that date. But let's just suppose that I wanna keep on or after. So I want to keep it as January 1st, 2017. In this dropdown box right beside it, I can see is after or equal to. So I wanna keep this as the date but instead of just saying after which would not include January 1st, 27 I can say is after or equal to. So once I've gone ahead and done that, I can click okay. And then now we'll notice that the dropdown box next to order date has a filter on it. And if we take a look over in the applied steps we can see that the rows have been filtered out to only keep the rows and the order date file that are on or equal to the 1st of January of 2017. And if you want, you can go back into that drop down box and see there's all kinds of different ways to go ahead and filter your dates. But that's good enough for this particular example here. So, the next thing that we're gonna do is we're gonna go to the product query. So I'll click on the product query here, and we know by doing some inspection, this data and this is something that you just have to know on this particular dataset. Is that there's duplication of information. So there are products in here that repeat themselves twice. So they essentially have the exact same product key repeating themselves two times in this particular query. And we wanna go ahead and do a D duplication of this. So the way that we're going to go ahead and do that, is with the product key highlighted, we're going to go over to remove rows, the drop down box and we're gonna go in and say remove duplicates. So before I do that, I just wanna draw your attention to the bottom right-hand side of the screen where we see the metadata that says there's six columns and 606 rows of data. So after I do this D duplication, my row count is gonna drop from 606. And let's just go ahead and run it. Say, remove duplicates, and we can see now that the row count has dropped down to 504. So there were indeed some duplicate product keys in this query that we just got rid of. And we're gonna see that later on when we move into the data modeling section, this is an ideal thing to do for any of the tables which are categorized as dimension tables. And we'll discuss what dimension tables mean once we get into lesson number three. Okay. So quite simply this unit or sub lesson is quite straight forward. We're just learning how to remove rows off our dataset in Power Query so that we're delivering the right information and the right amount of information over into our data model.