2: Importing and transforming data using Power Query Editor
2.12 Merge queries
2: Importing and transforming data using Power Query Editor
2.12 Merge queries - Video Tutorials & Practice Problems
Video duration:
10m
Play a video:
<v Instructor>In this sublesson,</v> we're gonna discuss why we might wanna merge queries. We'll discuss common challenges, and of course, we'll go through a demo on how to merge. All right, so let's go through a quick little demonstration here first through this animation to describe what is happening in the merge queries process. So we have two queries here, they both have the exact same column names, just called Column. And we can see that there are animals in there. We got in query one, a Dog, Cat, and Elephant, and in query number two, we've got a Dog, Cat, and Tiger. So we see there are some overlaps between these two data sets and some differences. So that's gonna be important when it comes to describing the different mergers that we can do inside of our queries. Okay, so the first thing we're gonna do is let's go through and perform what is referred to as a LEFT JOIN. So what a LEFT JOIN will do is say, bring all the rows from the left hand query, whether or not there's a match over in the right hand query. So we can see the results of what would be a LEFT JOIN down below, Dog was able to with Dog, Cat matched with Cat, Elephant came through as a result, but there was no match to Elephant over in query two, but that was okay. We said, we want to bring all the rows from the left hand query, regardless as to whether there is a match in the right hand query. And keeping in mind that query one is considered the left query, and query number two in this example is considered the right query. Okay, so now moving into the next example here, we're going to go through and do what is called a RIGHT JOIN. So what we're gonna do here is we can see that if we did what is called a RIGHT JOIN, it would bring all the columns from the right hand query, whether or not it was able to find a match in the left hand query. So once again, in this case, Dog and Cat were able to match between the two. The only difference here in this join type, is that Tiger was able to come through in query number two, even though there was no match over in query number one. Okay, so let's move to the next one. So next is what is referred to as the INNER JOIN, which only gives you the matching columns. So in this example here, Dog and Cat matched with both datasets, and those are the only rows that get returned because they successfully match between the two queries. And the final one we'll talk about here is what is referred to as a FULL JOIN. So what a FULL JOIN does is it actually brings all the rows from each one of the columns, whether or not there's matches on either side. So we can see that the Dog and Cat matched up perfectly like they normally have done, and Elephant comes over as well but there is no match to query number two. Tiger comes over as well in query number two even though there is no match over into query number one. So let's go ahead now and get ourselves into our Power BI desktop file and demonstrate using merge queries. Okay, once again we have our get hub files open. I'm going to choose the file that corresponds with sublesson number 12. In this particular lesson, let's go ahead and open that file up, and give it a moment. Okay, so we have our power BI desktop file open. I just wanna draw your attention to the Fields pane on the right hand side of the screen there. And what we ultimately wanna do in this merge query exercise is we're going to merge the Product table together with the Product Level table. What we need to do is go back into the Power Query Editor and actually merge the queries together, so that we could deliver one unified table out to the data model that contains all the information that we need to do product analysis. And keeping in mind that that analysis happens to be analyzing individual products, and then allowing ourselves to roll up from Products to Product subcategories, and then to Categories. So let's go ahead and get ourselves into the Power Query Editor by clicking the Transform data button here. And once we are in the Power Query Editor, what we're going to do is go over to our Product query first, so let's go ahead and highlight on that. And what we had done here originally is we went through in a few sublessons ago, we went through and did some column renames, and we did some data type changes but what we ultimately should have done back over in one of our previous sublessons was change the Subcategory key column from a Text data type over to a Whole Number. Now we're gonna leave it alone here for now 'cause I wanna show you what happens when we go to perform our merge, if the data types don't match. So this column has a Text data type, it's called Subcategory key. And ultimately what we're gonna do here is go over to our Product Level, and we're going to match it to this column right here. Now this one is called a ProductSubCategoryKey with no spaces. It happens to have the correct data type, but what we ultimately would have liked to done is actually go through and give the column names the same name. So let's not actually go ahead and do that for now 'cause it's not really gonna affect our merge query operations, but let's go ahead and invoke the process here so we can see the first challenge that we're gonna run into when we are on the Product query and look to move the Product Level into it. So what the Product query highlighted, I am going to go to the Home tab, go to the far right hand side of the screen, we see our combined group of commands here. I can see we've got Merge Query at the top here and I'm gonna click the dropdown box next to Merge Queries here, and have the option to Merge Queries or Merge Queries as New. So I don't wanna create a brand new query this time, similar to what I did over in the Append Queries. This time, I'm gonna take one queries values and merge it into the other. So I'm gonna choose Merge Queries this time. So what we're gonna see here is that the Product table is the primary table here, and now we wanna find a table to match up with it. So that table is going to be Product Levels. And what we can see here is in the Product Level table, we have got a column called Product Subcategory Key and ultimately we wanna match it to this one here called Subcategory Key. So keeping in mind it would be nice to standardize these names, although that's not overly important to go through this process because I, as the developer I'm seeing explicitly what I wanna match together. The bigger challenge here is the fact that it says, we need to Select columns that are of the same data type. And the problem here is the Subcategory Key over in the Product table is a Text value . So let's go ahead and get that fixed first. So what I wanna do here is I'm gonna go to the Product Subcategory Key column here, and let's go in the data type and actually change this to a Whole Number. Okay, so that's the first thing I wanna do. The next thing I'm gonna do is you know what, I wanna actually standardize these names. Let's actually go ahead and do that even though it maybe is not the most necessary step here, I really like to have a data model that has clean naming conventions. So I'm gonna go and call that Product or Subcategory Key as well. So now both columns have the exact same name, and the exact same data type. So what we'll do now is we're gonna go up and click on Merge Queries to re-invoke the process, did that from the wrong spot, so I'm just gonna click Cancel. Actually want it to be on the Product query first, and then go ahead and click Merge Queries. Now I got the Product table on top, so from the notation we spoke of earlier, this is considered the left table. And we talked about our merge queries back in the PowerPoint presentation, and the bottom table is considered the right table. So what I wanna do here is the right-hand table is gonna be Product Level, and what I wanna do here is I'm gonna choose the Subcategory Key in the Product table, and click it and match up a Subcategory key over here. And we can see down at the bottom of the screen it says, The selection matches 295 of 504 rows from the first table. So what that's saying is there's 504 rows in the Product table, and there were 295 matches that happened from the Product Level table. So that's perfect. What we're gonna do here now is click OK. So I can see that what has happened here is a brand new column has been added onto the back of my query here. And we can see that as referencing a table, so ultimately what's happening here is we have the Product table in here, we need to go and choose which columns from the Product Level table we actually want to bring into our Product port. So before we do that actually I just wanna jump back into my Merge Query step here to show you we can do this. Is in Merge Queries which was our last step, there's a little gear on the far right hand side here. So I'm gonna click on that, and that's gonna take me back over to the Merge Dialog Box. And as quickly I want to highlight here again, the join types. We're not gonna actually go ahead and change it, but here are the explanations of the different join types. So the Right Outer Join will do matches from the second table that match to the first, the Full Outer Join will bring all rows from both tables, and the Inner Join will only bring matched rows. And then there's two additional ones here called Left Anti, and Right Anti. We won't go through those. The first four are the ones we explained back over in the PowerPoint slide. So I'm just gonna click Cancel here again. We already have that step over here in our Applied Steps. What I wanna do next here is go to the Product Level column here, and there are these little table expanders on the right hand side of the actual column itself. If I go ahead and click that, I'm gonna get presented with a dialog box here. And what this is asking me to do is to declare which columns do I actually wanna bring from the Product Level table into the Product query. So I click the Select All Columns, now I'm back to blank. I'm just gonna say, I want the Product Subcategory and I want the Product Category, those only two columns actually want to bring in. There's no point bringing the Subcategory key 'cause one already exists over in the Product table. The next thing I'm gonna do here is I'm gonna click this check box that says, Use original column name as prefix. We don't need to use the original column name as the prefix, we just wanna bring the column names in as is. So I'm gonna go ahead and click OK, and what we're gonna see here now is that we have got the two brand new columns merged in from the Product Level table, those being Product Subcategory, and the Product Category column. All right, so we've successfully performed the merge that we intended on doing. So what we're gonna do here in our next sublesson, is we're gonna go and clean these null values up that we see in these two columns plus a number of other cleansing techniques we wanna perform on our datasets as we actually prepare them to go over into the data model for reporting. So that brings us to the end of this sublesson.