3: Structuring and Referencing Data for Analysis within Excel
3.8 Apply concepts for a data analysis solution
3: Structuring and Referencing Data for Analysis within Excel
3.8 Apply concepts for a data analysis solution - Video Tutorials & Practice Problems
Video duration:
11m
Play a video:
<v Instructor>Let's consider this table</v> with scores for students in different grades. For our solution we would like to get the details for our top three students for a selected grade. We will take a step approach to getting the desired results so that the solution is easier to follow. For this exercise, I've already set up a structure and define names for our skill ranges, so we can focus on other concepts and formulas. The define names are listed in italics above each dataset. To start with let's use the filter function to get a list of students in grade one. For this formula, the first argument of the filter function is the student data table. The second argument filters that data table where greed equals one and the third argument provides the message that needs to be displayed in case there are no results found once the data is filtered. As we press Enter, we can see our filter dataset displayed in a spill range. As the next step we will use the sort by function in addition to the filter function to sort our filter list of students by their score. For this formula we replace the first argument of the filter function by the sorted list. The sort by function takes the original student data table and sorts it by the score column in descending order. We still maintain our filter for grade equals one. However, as we press Enter we noticed that there's an issue with this result. We still have a great two students showing in this list. Now let's review our formula again. Here the first argument of the filter function is a sorted list. However, the include argument expression is still using the unsorted table column reference for the index of rows to a return as part of the result. Since the sequence of records in the sorted and unsorted lists does not match we get an erroneous result. To fix this we will change the include argument to use the same sorted list that we're gonna return the results from. So we will copy the expression for the sorted list, replace the first part of the include expression. And since we're only comparing the grade column, we will bend that to the first argument of the sort by function and press Enter. As you can see, we now have the correct filtered list sorted by score in descending order. Next we will use the index function in conjunction with the formula we have authored thus far, to a list of students with the top three scores. For this we will use a couple of array constants. First we will use an array constant with three separate records, one for each row number we need from the filtered list of students. This is essentially the rank for students we need on each row of the new list. As we have already discussed, we use a semi-colon within array constants to specify new rows. So in this example, we will place the required ranks on separate rows of the array constant with each separated by a semi-colon. As we press Enter, we will notice that the result is a single column spill range. This is because we skipped the column argument in our formula and using the implicit intersection behavior, Excel returns the leftmost column by default. Now let's provide an appropriate column argument, so we get the name and score columns as part of the result. Note that we want to select the column fields as part of each row. And we have already discussed that a comma in an array constant can be used to separate fields in the same row. So in this example we will use an array constant where we specify the index of each field that we want displayed for every row. So in this case the name is index two, in score is index three. Then we close the curly brackets to confirm the array constant and then press Enter. As you can see, we now have a list of names and scores for the top three students. Now the setup will work for grade one. However, we would need to manually modify a few formulas if we want to display a similar list for grade two students. So let's add the ability to have a top three list be updated for a selected grade. While there are a few ways to implement this, we will stick to dynamic arrays for this example. Let's first generate a list of grades available within our table. But we'll use this list to populate a dropdown where we can select the grade to filter down our top three student list. In this case, we will use the combination of the sort function and the unique function. The unique function here gets a list of unique values from the grade column in our student data table and the sort function basically just sorts that in an ascending order. As we press Enter, we will notice that our grades in table list is consistent with our student data. Next we will define the name to referenced this grades and table list. For this typically we would click on the Name Manager button under the Formulas tab of the ribbon bar and then click on the New button to define a new name. However, in this case I have already defined a name referencing the list of grades and table. As be open this named definition, we will note that the name is listed as T3S which is just a short for the sheet name and then grades and table. It then references the starting cell of our spill range and then uses the spill reference rotation pound symbol to reference the entire range. In this example our defined name will always dynamically update for all grades in our student data table. We now click Okay to get back to the Name Manager and then click on Close to get back to our Excel sheet. After defining the name we will now use the data validation capability within Excel to use the list of grades for our dynamic dropdown selection. We will discuss this topic in more detail later in the course. But for the purposes of this exercise, please note that data validation with an Excel is used to restrict the type of data, what are the values that the users can enter in a specific cell. One of the most common data validation uses, is to create a dropdown list also known as a data validation list. Before dynamic arrays there was no easy way to create dynamic data validation lists. We were restricted to functions like Indirect, which have their own challenges due to the use of static text references. So now let's see how easy it is with dynamic arrays to create a dropdown from our dynamic list. To do this we first want to select the cell where we wanna display the dropdown, then go under the Data tab on the ribbon bar, then to the Data Tools group and then click on the Data Validation button. Here we wanna select the List option for the Allow setting. And then for the source, we wanna provide a list of values that we wanna display in the dropdown or maybe a cell or range reference for a similar list. On the other tabs, we can provide a friendly input message for the users and also an editor message in case of invalid data entry. Once done we can click on Okay and then confirm if our data validation list has the options we expect. For additional testing, we can also update the grade for one of the students. Let's say three, and then made sure that our grades and table list is automatically updated for the new value. And also that our dropdown includes the new grade value. Once done we can revert back to the original data set and carry on with our example. So at this point with this spill range that contains the unique grades from our source data table. The define name that refers to this spill range and the data validation that uses the define name to create a dropdown list with the selection of the same unique grade values. We now have a structure in place to select a grade from a dynamic list of options based on our source data set. Next, we need to update our dynamic array formulas in the three data sets above. To do that we should probably define the name for the cell containing our dropdown list. This will help us to easily reference the selected value in the formulas above. In this case, I've already provided a name for this cell which can be confirmed within the Name box in the top left of the screen. Now let's update the formulas to use the new name. So for the first formula here where we're getting a filtered list of grade one students, who were placed at one with the selected grade. We'll do the same for the second data set here and the same for the third. As you can see that none of the results have changed at this point. That is because the current value for the selection from our dropdown is the same as what we had before, which is grade one. Now let's try to change the selection here to grade two and see the results. As you can see, the three lists above have automatically updated for grade two results now. So with this example, we've shown that dynamic array functions can be combined with other Excel functions to allow for more complex data processing and array output generation. They can also be used to define names for dynamic lists, implement dependent dropdowns, and generate valuable metrics for more advanced data analytics capabilities. During the next lesson, we will look at some practices for preparing our data for analysis and then look at a couple of more advanced data analytics solutions.