23.3 Fit a multilevel model with Stan - Video Tutorials & Practice Problems
Video duration:
6m
Play a video:
<v Voiceover>We have</v> seen how Stan can be used to fit a simple linear progression, but now we are going to use Stan to fit a varying intercept multilevel model. So to start, we're going to save a text file as a Stan file, and we'll call this one, House2.stan. We'll say yes, "We want to convert it to a "Stan type file." The data block will start with an integer, with a lower bound of zero, and for the number of rows. Then we're also going to have a number of boroughs, there will be five boroughs. This will be an integer, with a lower bound equal to one, because the first borough will be one, and an upper bound of five, because the last borough will be five. We will call it borough, and there will be an entry for each of the rows, and each of those entries will be, one, two, three, four, five. Then there will be a vector, and vectors are vectors of real numbers, of length n, per square feet, and lastly, a vector of size n, for value. Now the cool thing about Stan, unlike previous Bayesian languages, is that we just declared square feet. We are not going to use square feet, but it's still there and it won't effect our model, other tools that wouldn't be possible, so thankfully Stan allows that, now we declare our parameters. Now the vector of A's, this will be size five because there'll be one intercept for each borough. We are not going to have a slope for each borough, just an intercept. However each borough will have its own standard deviation, so we will declare real, this is as a standard deviation, lower equals zero, sigma, A. And we will have an overall sigma, so real, lower equals zero, sigma Y. And we will also fit a prior, on A. Because we have prior information on what's realistic values for the value per square foot, of housing in New York City. So that's another parameter we will estimate new_a. Now on the model block, we will say, "View A is distributed, "normal 50, 10." What we're saying is, we have a prior belief on what these intercepts will be. The intercepts will be modeled, normal, new_a, sigma a, so essentially we're modeling the standard deviation, we're modeling this prior new, and then we're going to model actual intercepts themselves, and the overall standard deviation. Now since this is a multi-level model, we're going to have a different A for each borough, we are going to build a for loop here, which looks very similar to our for loops, and here we will say that, "value i, is modeled, normal, "the mean is A, the intercept, for the borough, "for observation i, "and it's standard deviation is sigma Y." We save this, and then, open a new R script, where we will say, "House2 gets, "Stan "of house2.stan, "data equals list, "n equals n row of housing, "value equals housing dollar value per square foot." On the next line, square foot, even though we're not using it, we'll pass it in just to show we can, housing dollar square foot, and borough equals, as.numeric, as.factor, housing dollar borough. Close the factor, close the numeric, close the list, and then enter equals 100 for 100 simulations. It is now compiling the C++ code. You do not have to compile the C++ code each time, let's say you've fit the model and you get new data. You can just run simulations on the new data without recompiling the code. Take away this compilation step, and the process speeds up significantly, so you only have to do this the first time you construct the model, but not each time you fit a model. Now it's doing it's sampling. And we get some red errors because our model isn't converging nicely, but that's also why we run multiple chains. And that could be an indication that our model wasn't built properly, and we need to come up with different priors. Now that it's done simulating, we can print out the results, and we do see results we like. We see the third borough, which is Manhattan, is significantly priced higher, than all the other boroughs. Now the R Hat statistics aren't great, which means we still have work to do to improve the model, but this was a good start in figuring out what drives housing costs in New York City. While this may seem slower than frequentist statistics, and it is, Bayesian statistics let's you estimate models, that are overly complicated for standard frequentist methods. And when you have to use Bayesian statistics, Stan is a great way to do it much faster than the alternatives.