2.3 Load a folder of data - Video Tutorials & Practice Problems
Video duration:
10m
Play a video:
<v Instructor>So far, we have seen how to load</v> individual files into Get & Transform and through the Power Query Editor. But what if we have a number of files that all have the same structure. One approach would be to load each file individually. But over time, that would start producing a bit of a maintenance challenge. Because every time a brand new file came along, that was of the same type, we would have to go into Power Query Editor, and basically add in a brand new connection. In enter the folder data source. With this particular data source, we're able to put files that are of a similar type into a folder and have the folder data source treat all of those files as if they were one file. So this is fantastic. When you have a scenario where you are continuously delivering files that all look the same structurally, but have different names, and maybe they're being delivered to you as the consumer on a periodic basis. We can simply write one set of transformations that apply to the folder and then people can just at their leisure drop files that are complained into the folder, and they will load directly into the actual target. And as I mentioned, this saves a tremendous amount of maintenance. So what does this look like? On the screen, I have a example here where I have a folder name, and there's the name of the folder, plus a number of different files that exist inside of that folder. And they're of all the exact same type. So we can see we've got four different files with their names. So we've got a product accessories, bikes, clothing opponents, and they're all Excel files. So all four of these files exist inside of one folder. So let's show you what that looks like. So here, I can see that I have File Explorer opened to the folder name, L02S05 And I see the four physical files located in the folder itself. So with the means you had previously learned, you may be tempted just to connect it each one of these files individually. And then if somebody added a brand new product type into this folder, you may go and create a connection directly to that, and bring it into power query on a file by file basis, that has mentioned that's gonna be inefficient from a long term maintenance perspective. So what we're gonna do here is create a connection to this folder and treat these four files as if they were one. So to do this, we are going to go to the Data tab, we are going to click on Get Data. Go to From File and we're gonna choose From Folder. The folder name dialog box appears and I wanna go and browse to where that folder is. So I'm going to click on the Browse button, I am going to go to the location where my folder is. So I'm just gonna navigate to it here. So there's my folder, L02S05, click Okay. And now the path to that folder shows up in the dialog box. I click Okay. All right, it looks like I found four files inside that folder. With this extension, if I just move this box over a little bit, this is these are the files that I would be looking at here. And plus a little bit of additional metadata about those files, which we're not concerned about at this point in time. So what I want to do next here, is I'm gonna go down to the bottom here under combine. And I'm gonna click the drop down box next to it and see Combine & Transform Data. Give it a moment. And what is happening now is we get presented with a dialog box that is asking us, how do we want to go about combining these files. So some of the rules that were keep in mind here and we'll recap these in a few moments is that first off, we need to choose a sample file. By default, it's the first file in the directory. So I'm just gonna go here and just choose Product Accessories. I already know that all of these files follow the exact same structure. So they made the rules, there necessary to be loaded. But I'll discuss what happens if they don't here momentarily. So what this means here is the sample file is the accessories file. And what will happen now is all other files have to look like product accessories from a structural perspective. So with that one being shown as my sample, I go down here and say, well where do I want to get data from that sample file? And as mentioned, I like choosing my data out of Excel tables. So I'm gonna choose tblProduct gives me a sample of what that looks like. And now what's gonna have to happen is the other files that I bring into this particular transformation to follow this rule have to match the structure product accessories. So there has to be a table, Excel table, and each one of them called tblProducts. Otherwise, an error will occur. We'll discuss that in a second. And secondly, the column names that will be pulled in have to match what's in product accessories. So if any of the other files outside of product accessories have columns with different names, they will not be pulled in, it will only pull in the columns from the other files that match the column names of the product accessory files. Now, I know that sounds like a little bit of a mouthful, but we'll summarize these rules at the very end. So I'm gonna choose tblProducts. And the last thing I wanna choose here is this button at the very bottom that says, Skip files with errors. So this little checkbox right here, if I click this, what will happen is, the loading process will skip any files that don't structurally match product accessories. And it will do so silently. So it'll just skip over the file, and load what it can that matches product accessories into the target model that you want to load and carry on. If I did not check that checkbox, what would happen is if any files inside of that directory did not match the structure of product accessories, the load would fail, and then we'd have to go in and correct things. So the results are extreme, either it completely fails, or it skips over the erroneous files silently and you need to go and figure out how to fix that yourself. So if I know that all of these files are of the same structure. So I'm just gonna simply leave that unchecked. And I'll click Okay. And what will happen now is I will get taken into the Power Query Editor. 'Cause earlier on I chose Combine & Transform data. So now I'm sitting inside of the Power Query Editor. And I can see where I got all of my data from. So I can see that I have my query named L02S05. So it took the name of the folder itself. So let's go ahead and rename this to something more appropriate. So this is a new location where we can go ahead and rename our queries. So I'm gonna just call this products like so. Let's rename it to Products, let's get it spelled right, like that. Hit, Enter and now we're gonna see our query has been changed to products. I can also see here, my Data preview pane, is I have a little drop down box in this column next to Source.Name and here is where I can go in and take a look at all of the source files were loaded in. So I can click on this. And it will show me that I got some information from product accessories, bikes, clothing, and components. So these are the four spreadsheets that were loaded in. And you'll notice all four of those spreadsheets were brought together and treated as if they're one. And they're loaded under the query name Products. So now that I have completed that piece there, I can click this Close & Load I can say the Close & Load To. So now I'm gonna actually choose that option in a different spot. Remember, we did that the Import dialog box in the previous sub lessons. So here in the say Close and Load To, which is basically telling me to close the Power Query Editor, and load that data to a target location. So now I get back to my Import Data box, I'm gonna choose the only create connection like I've been doing so far. I don't wanna load this in the data model. And I'm going to click, Okay. So now we see over in my our Queries & Connections pane, an additional folder, which has same queries in here that are being used to help load that data from the four spreadsheets and treat them as if they're one. So I'm just gonna close these folders up here. So I'm just gonna go like this. 'Cause I don't wanna go and play around with any of those because that was created automatically for me through this transformation process. And now I can see I have this new query here called Products, which once again has a column called Source.Name showing me which one of the sources a particular row came from. Plus all of the additional columns I can see there's 14 columns in here was last refreshed at 12:47. This date has not been loaded anywhere. And this is the source of the actual folder itself. So just to recap here, the folder data source is very useful when you want to go ahead and load a number of files that are of the same type, and allow users to drop individual files into the folder. And as long as they're the same type as the sample file we saw, it will just treat them as if they were all one file. So the rules are, all the files have to have the same structure that is been declared in the sample file. That means the file types have to be the same, the columns all have to be the same. So the only columns that will get loaded from any other files that meet the structure are the ones that are in the sample files. And you do have the option to either skip the files with errors, or let the entire load fail if any errors appear in the actual files themselves and that brings an end to this sub lesson on loading folder data sources.