Delaney Henson: All right. I want to go ahead and begin by thanking everyone for joining us today for our third Unwritten webinar in which expert authors join student co-hosts to explain COVID-19 and what it could mean for the future of our economy, healthcare and culture.
These webinars have proven to be incredibly timely and insightful thus far and I am so thrilled to be moderating this one today, where we will attempt to demystify some of the data surrounding COVID-19.
So to get us started with some quick introductions. My name is Delaney Henson and I am a student at the University of Louisville, where I study communications, marketing and creative writing and serve as a Pearson Campus Ambassador.
I am joined today by Laura Howe, the VP of Innovation Communications for Pearson, who will be moderating our live Q&A. Get your questions ready and go ahead and put them in the chat box and we will get to them in just about 15 minutes.
And now I would also like to welcome Dr. Rob Gould, Vice Chair of Undergraduate Studies at the UCLA Department of Statistics and Director of the UCLA Center for the Teaching of Statistics. In 2012, Rob was elected Fellow of the American Statistical Association.
Our favorite way to begin each of these conversations is to go ahead and turn it over to our author expert for a two minute hot take on the most important thing people should know right now about what is happening in our COVID changed world.
So Dr. Gould, please give us your hot take.
Dr. Rob Gould: All right. Thanks! So I guess I want to begin by saying that I think of myself professionally as a statistics educator, which means my scholarly activity is really oriented towards improving statistics education and finding ways to improve it. And one of my beliefs about statistics is that it’s something that everyone can and should do. So I would like to live in a world where everyone who has questions about this pandemic could find the dataset they need themselves that answers those questions and make sense of it and be better informed by their own analyses as well as by critically reviewing others. So that’s the world I want to live in.
What’s my hot take now and I guess the most striking lesson that comes to me is, we live in the age of big data. We have heard this. It’s been great for our careers in statistics. There is lot for the employers; statistics and data are kind of driving our culture, but it seems that it hasn’t really helped prepare us very well for this pandemic. We haven’t really seen yet a great application of data and statistical analysis that would better prepare us.
And so the question for me as an educator is what can educators provide that would make us better prepared for these sorts of things. I think the main reason we haven’t been well prepared is we haven’t had good data and as I am thinking about who needs to be educated, I am kind of seeing two groups of people.
One group are the people -- mostly civil servants I think who are posting the data that come out of this, the number of new cases, the number of deaths and posting it for the public, but maybe not posting it in a way that the public can easily consume or that answers/addresses the questions that the public might ask.
And then the other is for the general public about trying to help them become more critical consumers and critical doesn’t mean skeptical about everything. It means understanding what conclusions we can and can’t make and how far we can kind of push our reasoning with data.
So I think what I would like to see is improvements in those two groups of people in our educational process.
Delaney Henson: Yeah, thank you for sharing Dr. Gould! I think you make such a good point just about the fact that we can all understand the statistics to some extent and become more critical consumers of the data that’s out there for us right now.
So let’s go ahead and get right into some of the questions about the statistics of COVID. So the first question that I have is, when we see statistics or models presented talking about the pandemic, what is your best advice to help people make sense of all of it? What do we need to do or to be asking ourselves and keeping in mind when we evaluate them?
Dr. Rob Gould: So it’s a big question. I think one of the fundamental lessons of statistics is that context matters, that it’s very important that we interpret our conclusions in the context that the data were collected. And in this case, the context is incredibly complex. So I think the hard news is that we are not going to find an easy one answer fits all.
A lesson I can give you that would tell you how to make sense of the data and the models and all that you are seeing, but there are some basic questions I think we can all kind of consider. I mean one is how were the data collected? Were they representative of the group that you are trying to apply them to? What’s the process through which the data are recorded? Another really interesting question is to ask who is missing from the data?
I have just been reading this great new book called ‘Data Feminism’ by Catherine D'Ignazio and Lauren Klein and one of the questions they ask is -- or they say we should always ask is who is missing from the data? So for example, are the homeless people as represented at these counts as people from suburban neighborhoods are? So there are all sorts of critical questions that we should be asking and trying to understand the answers to when we are interpreting these data.
I think you ask about models, and I think it’s important that a lot of the conclusions that we make in statistics are filtered through these lenses of models. I think one lesson is to understand models are not reality, that’s why they are called models. And what a model does, just like a model airplane, it captures some aspects of reality, but not others, and so you want to make sure when you are reading about someone’s model that you are asking what aspect of reality are they trying to capture?
Maybe asking if they captured it well is a tough question because it requires some expertise, but you want to know is this model meant to apply towards the sort of questions I am wanting to ask. I think that’s kind of an important thing to keep in mind when we are hearing about all these different models that are coming out.
Delaney Henson: Yeah, yeah, that’s such a great point about the models and also just about making sure that we are considering the representativeness of the samples that we are taking in as data right now.
Moving right into the next question, I see graphs and data everywhere in the news right now, so how can we use them to try to better understand the pandemic and all its related health, economic and social consequences and maybe what are the limitations of statistics right now?
Dr. Rob Gould: Yeah, so all of these are really big questions, which is good because we are in a really big crisis.
So looking at graphs, there are all sorts of graphs and there are many books out there about -- and many chapters in the books about how not to read -- how to produce bad graphs and how to be wary of people trying to deceive you through graphs and I could go through that list, but I think that’s not so beneficial, or at least that would take an hour or more.
But I think one difference between how laypeople look at graphs and how statisticians look at graphs is when we are trying to view graphs, like all statistics is, we are looking at the variability. And I think when you are reading a graph it’s important to know that the point you see is not fixed in stones. It’s produced by a complex process that causes variability.
So I will just show you one of the graphs I have been looking at, which is a graph of counts in LA County. So let’s see if I can do this.
So I mean when I am looking at a graph like this, each of these dots -- each line represents a county or a city or a neighborhood in Los Angeles and each point represents a count and this comes from the LA County dataset. And what I am realizing when I see this is that there are many sources of variation.
One source is that the infection is growing, the virus is growing, so there is some spread. But another source is that screening tests, availability and accessibility is growing. So that’s another source.
Another source is that there is some chance variation in the reporting or in the measurement itself, so some of them are false positives, they are false negatives.
There are some errors in the reporting. I have seen some scores go down on some days, not because the number of people changed, but because maybe the errors were corrected. And then there are just chance errors in the entry of data and random variation. So there are all sorts of -- different sources of variation that we need to be aware of when we are interpreting graphics like this.
Delaney Henson: Yeah. Thank you. Thank you so much for sharing that graph. I think that’s so insightful to be able to take data like that and to interpret it in a graph like that for us.
Moving on to my third question, as a news consumer and someone with a particular interest in communications and journalism, I am wondering if there is any way to know if the data or statistics we are seeing are trustworthy? In other words, is there a way for the average person to tell which sources are most reliable?
Dr. Rob Gould: Unfortunately, data sources don’t come with the big sticker of endorsement that says 99% factual and it would be great if they did. I think the best lesson is to stick to reliable sources.
So like the data I showed you came from the LA County Health Department. So is it flawless? I doubt it. I mean it was collected by humans, it was posted by humans. In fact, I have a lot of complaints about how the data are displayed, because they are displayed in a way that make it really tricky for people to get access to it. So it’s almost as if they didn’t really want us to analyze this data. But still it’s our LA County. So it’s a government source.
The CDC is another trustworthy source for data. They have protocols and things that they do their best to make sure the data are as reliable as possible.
Johns Hopkins University has been, among others, other places have well, but they have become known as a nice kind of curation of reliable data sources. And again, the important thing to remember is context matters. So if you are looking at a data source you should trust it more if it’s giving you more about the context of who collected the data, when the data were collected, how it was collected, what’s the historical, especially with something like this, the historical perspective matters a lot.
So one of my criticisms about the LA County is it just shows you today. So the numbers are always going up. They can’t go down because it just shows you the total number of cases today, but what you really want to know is how does today compare to yesterday and so you want to be able to find sources that are kind of addressing the full context.
I guess the other thing I would add to that is because again context really matters here, that means that experts who have deep knowledge of the context, experts in epidemiology and virology and biostatisticians and other people working in this field are to be trusted over dabblers such as myself who are looking at things but don’t have that deep context knowledge.
So there is a lot of danger in weighing into new areas where we don’t understand the full context. And so I think an important lesson is to make sure when you are reading things that the people are indeed experts in what they are writing about.
Delaney Henson: Yeah, I think that’s such a great point. You mentioned the wider context and also just that we are not only looking at a graphic or reading an article, but also understanding where that information is coming from, I think that’s important now more than ever.
Moving along to my fourth question, as a statistics professor, what do you think should be done to improve statistics education? And on a more general scale, data literacy in the United States, how do we increase the ability of people to understand data?
Dr. Rob Gould: Yeah, so that’s for me is the $100,000 question. I mean that’s what I really want to focus on and I think -- my answer is complex I guess. As I am thinking about what’s missing, one thing that seems that we do too much of in ours -- so most college students and many high school students take an introductory statistics course, and not to speak for all courses, but there is a tradition of these courses I think being much too mathematical and focusing on statistical inference to the detriment of some other really important topics that would be helpful for people to know right now.
So I don’t want to diminish the importance of statistical inference because it teaches us one great lesson, which is that you need a representative sample to be able to make conclusions about the population and to date we have no records and a sample anywhere about this -- at least in this country that I am aware of about this. So that’s an important lesson.
But another thing we lose when we get too involved in all the mathematical details of learning about all the different sorts of tests and the different scales of measurement are teaching how to analyze data, teaching some of the basics of data analysis, teaching the fundamentals of statistical thinking, like learning to look for sources of variability and understanding the omnipresence of variability.
I think we need to teach more of what my colleague Tim Erickson calls data moves, which are kind of like basic computational moves that help people manage and wrangle data.
So again, I will just show you a quick -- I mean this is what the data looks like from the LA County Health Department. So it’s just a table like this. When you do homework in statistics class everything is just in a neat table-like form for you to begin with, but with this table, this is the data you want, but you can’t easily access this, and so we need to teach data moves so that people can access the data that they want to see.
So I think those are the four things. I guess that’s deemphasizing statistical inference, not eliminating it, but deemphasizing it, emphasizing data analysis, teaching data moves and really emphasizing this role that variability plays.
One last thing I guess I would add is that -- that we in statistics do have kind of a way of thinking about problems that I think is not different from others, ways of critical thinking and evaluation and analysis, but I like to call it the data cycle. I will show you that right here. This is based on the American Statistical Association’s four stages of statistical investigation and basically it just is a roadmap for maybe what you -- how you might proceed in thinking about data and I think every student needs to learn this.
We are teaching into high school students in our Introduction to Data Science Program. LA Unified High School and Centinela Valley High School, and other regional school districts. So you start by asking questions like how are the COVID-19 cases changing and then you have to find data that are reliable and that addressed that question and then you analyze and you interpret and you cycle through this, because your interpretation usually -- usually leads to more questions.
So I mean I think if people are more familiar with strategies and plans for how to approach problem-solving when it comes to data, I think people will be more competent or I should say confident, hopefully more competent but definitely more confident in their ability to evaluate other people’s analyses as well.
So when they’re reading the newspaper or watching the news or reading a blurb and Instagram or Twitter they will better understand what that might mean if they are better -- if they had experience analyzing data themselves.
Delaney Henson: Yeah, I think -- I think that’s such a good point about really enabling students to be able to understand how to work with data and to analyze data even in most introductory statistics courses because statistics and data are something that’s going to -- that are very essential to our lives and that’s evident now more than ever.
So thank you so much for sharing and we do want to take some questions from the viewers who have joined us. So I would like to now send it over to Laura for some questions that have been coming in from our audience.
Laura: All right! Great! Thank you. So I have been watching the -- the questions box and the chat box since we started here, a lot of good questions coming in. First, really simple question for you, a number of people have asked you for, to repeat the name of the book that you mentioned that you’re reading earlier about the missing data. Lot of people were interested in that. So a super-easy question right off the top.
Dr. Rob Gould: Yeah, so it’s called Data Feminism. I am looking now, I mean maybe I could just -- I know nothing about the author, this is not a -- a blurb, but I mean I think if you do data feminism, it will come right up. So I have just started reading this myself. It’s a very fascinating book and really makes us think more critically about the role that -- that data play in our lives. The other book I just highly recommend is Cathy O’Neil’s book called ‘Weapons of Math Destruction’ which is one of the best titles ever of any book.
Laura: That is.
Dr. Rob Gould: The ‘Weapons of Math Destruction’ is another book on this topic that I highly recommend.
Laura: Okay good. All right! So let’s dive into some other questions. So we have somebody asked a really interesting question about the media and how they’re reporting staffs and sort of what’s wrong with media reporting right now, and it’s interesting because you mentioned every day we see the case counts going up and up, and I don’t know how it is. I live -- I live in Atlanta, Georgia and every night they start off the nightly news saying we have x number of cases in Georgia and it just keeps going and going and going.
But that doesn’t really tell you the whole story, so I am interested and some of our viewers are interested in finding out your impression of how the media is treating some of this data and statistics right now.
Dr. Rob Gould: Yeah, well, I think it depends on the media. I -- I think again, context is key and -- and a historical context and a general context is really important.
So I think some media outlets, especially those that have more time, so maybe those are newspapers or radio shows that can devote more than a minute to a topic or probably capable of getting into some of the nuances. When I was talking about the counts always going up, I was kind of referring the fact that LA County only reports the total number of cases to date.
And so that count will never go down in a million years, it’s always going to be at least the size it is today and what we really want to know is what’s the change, we are looking to see what’s the rate of change, because we want that rate of range to start to become negative. We want to see that today there were fewer cases than yesterday.
So I mean we want to look for things like that. I think it’s kind of good to keep in mind that the news reports on things that are unusual, that’s almost what the word “news” means. You -- you don’t sell newspapers or get viewers to TV shows to say nothing really interesting happened today.
So you have to find cases that are exceptional from the status quo, and I think what you want to do is not -- as consumers we need to not be distracted by the exceptional cases and we need to be focusing on what’s the status quo.
So what are -- what is the vast majority of experts, what are they saying and not focus on maybe what a few dissenters here and there are saying or on a few unusual opinions certainly maybe it’s interesting that those opinions exist, but one reason they are unusual is they are probably extraordinary and this -- my favorite skeptic James Randi says extraordinary claims require extraordinary evidence. So it’s a good thing for all of us to keep in mind right now I think.
Laura: Yeah, that’s one thing that I think is interesting and someone asked about is how unusual is it for people to sort of disagree on what the statistics say because you’ll see a report come out one day and then the next day a bunch of other people sort of come out of the woodwork and say, well, that report or that data isn’t sound or I disagree with it. So is that normal in the statistical, the mathematics, the scientific community?
Dr. Rob Gould: Yeah, I think it is, I mean, I think normally what happens is someone writes a paper on some research that they have done and maybe they distribute it and they get some criticism and -- but then it goes to a journal and some referees just dig into it and tear it apart and then it gets published and people considered a field, they try and pick as many holes as they can. And normally and there’s a back-and-forth and hopefully it iterates towards a strong understanding of what’s really happening. Probably that understanding rarely comes from a single research study. It usually comes from a collection and from that -- that dialogue, but ordinarily that’s happening kind of backstage as you will and it’s happening slowly.
Now what we’re seeing is there’s no backstage, the curtains have been torn down. We are seeing these arguments happen in real-time so we are getting lots of back-and-forth that is understandably confusing. You never know from day-to-day whether to wear that mask and what kind of mask you should be wearing, but we are learning as we go along and so this is part of the process of learning.
Laura: So someone wrote and I think this is an interesting question how long do you think it will be before the CDC or others for example will have really good statistical data on the pandemic, is it going to be years, is that going to be decades, and I think this goes to the whole idea of as we are in the middle of this and this sort of fog of the pandemic if you will, like can you -- can you really understand statistically what’s going on or is it going to take kind of a rearview look at this to -- to fully understand what’s happened?
Dr. Rob Gould: Well, I mean, I guess I would say everything benefits from a rearview -- a rearview look, but I guess I am not too comfortable answering that question, it’s not my area of expertise, I have not been involved in the data collection or in reviewing papers or reports about this topic in particular. So I really couldn’t comment on that.
Laura: Okay, so here’s one from a faculty member. So any suggestions on resources for teaching stats, modeling with the COVID-19 data either websites or kind of teaching modules or things that you could recommend for faculty members who are on the call?
Dr. Rob Gould: Yeah, that’s a great question and I wish I had been prepared with the list because I have seen lists recently. I think one thing I would advertise especially, if you are teaching statistics at a smaller university or college is -- there’s an excellent Listserv called IsoStats. That’s a group of isolated statisticians and many of us are on that list even though we are not that isolated, and they have been sharing lots of resources recently that address those topics.
So I recommend just kind of googling IsoStats and seeing if you can get -- all you have to do is send -- once you find it -- asked to be joined to the list and you will, and another place again to start is with the Johns Hopkins Dataset, I mean that at least gives you some data. I also -- gosh, again, I wish I had -- were better prepared for this, but I think if you google something along the lines of ten cautions for interpreting graphs in COVID, you will get this great essay about ten mistakes, ten things you should not do when creating statistical graphics about this virus.
Also, this is really hard to spell and say out loud but my colleague Mine Çetinkaya-Rundel is at -- she was at Duke University and now she is at University of Edinburgh, she’s compiling a webpage that has resources. So I don’t know quite how to get her name to you, I am going to see if I can open a window and type it in, but Mine Çetinkaya-Rundel.
Laura: Great, okay good. Those are -- I think that’s actually a really kind of good list of things for -- for people to take a look at. I think we have probably got time for one more question before I throw it back over to Delaney. And you have shown some of the things that you are looking at from the LA County Health Department, but how can people use data really to guide their own decisions about what’s happening in their community and help them navigate this whole environment?
Dr. Rob Gould: Yeah, I think that answer is going to depend a lot on what community you live in. I think many governments, local states and -- and the Federal Government have spent a lot of resources in recent years on an open data movement, which is making datasets that are collected at taxpayer expense available to taxpayers, and unfortunately that movement is very uneven. So some cities have fantastic datasets, and I am lucky to live in -- in one of them that both the city and the County of Los Angeles and all of the surrounding cities like Santa Monica and Covina and Long Beach have fantastic data repositories online that the public can access and can help them address their questions that they may have about -- about COVID and other things and other counties are and cities are much less. So the answer is it depends, but again I think if you don’t have access to data, I mean the next best thing is again to pay attention to the context and to people who are familiar with the context of both medicine and viruses and epidemiology because they will probably -- no one knows anything with certainty and as a statistician I appreciate that very much, but some uncertainties are smaller than others.
Laura: Okay, good. So actually there is -- there is one more I want to ask because it’s from a student, and I think this is a really nice question. So this is a student who has really been enthused about mathematics and using math to inform pandemic preparedness and response, he wants to pursue a career in the area of statistics and mathematics. So what kind of academic preparation do you recommend for him, that’s what he is asking?
Dr. Rob Gould: Well, that’s a very nice question. So it boils down to three things I think, you want to take as many statistics courses as you can, you want to take as many math courses as you can, and you want to take as many computer science courses as you can. And you can emphasize one of those three areas over the other and that will kind of help emphasize the one that you’re most interested in and that’s going to kind of help filter you into one set of graduate schools rather than another but you can’t go wrong with either choice.
Laura: All right, so some good career advice for Kyle Kay. So, Kyle, I hope that helped, and on that note it’s a good note to throw it back to Delaney. So wrap us up.
Delaney Henson: Thanks Laura and thank you so much for your time, Dr. Gould. I would also like to thank everyone who joined us today. I think that there is certainly something for everyone to learn from this conversation and in a larger context from the data that surrounds COVID-19.
So I hope that this conversation was just as insightful for all of you as it was for me, and in the meantime while we continue to ride this out physically distant but ultimately together; stay safe and be well.
Dr. Rob Gould: Thank you Delaney! Thank you for hosting!
Delaney Henson: Thanks Rob!
Laura: Thank you!