21.2 Read edgelists - Video Tutorials & Practice Problems
Video duration:
7m
Play a video:
<v Voiceover>Now that</v> we've gotten used to working with the igraph package, let's load in some serious data. On my website at jaredlander.com/data/routes.csv is an edgelist of a fictional airline. Let's take a look at it. We'll say, jets <- read.table ("http://www.jaredlander.com/data/routes.csv", And this is just reading in a normal csv file, so we need all the pertinent information, such as sep="," header=TRUE, and on the next line so that you can read it, stringsAsFactors=FALSE) We run that and then take a look at it. head(jets) If we check at the head of it, we see we have information on planes going from one airport to another airport and the time in minutes that the journey takes. This is called an edge list because each row represents an edge. For instance, the edge going from Austin to Las Vegas, or Austin to Portland and there happens to be additional information, such as the time. So let's store this as a graph. So we say, flights <- graph.data.frame we pass in the data frame we're using, which is (jets, We tell it directed=TRUE because going from Austin to Las Vegas could be different than going from Las Vegas to Austin, and since these are planes, that makes sense because of the jet stream. So, we run at it. Now let's print it out and see what it looks like. Say, print.igraph(flights, full=TRUE). We have a lot of information and it's basically showing each of the nodes that you can go from Los Angeles to Austin to Boston, Chicago, from Chicago you can go to Los Angeles, Las Vegas, Palm Springs, et cetera, et cetera. But again this is network analysis so it's helpful to plot it, so we say plot(flights). And I will zoom in on this. You can see it is kind of a hard to read graph, all these arrows are getting in the way, but it shows you all the different airports and where they connect to. It's sort of like a route map, but it's displayed according to this layout instead of over the United States. Going back to R, we can check out all the edges by saying E(flights). We now have a listing of all the edges, and if we scroll through it we see which airport to which airport. Likewise we can also get a list of all the vertices by saying V(flights). And now we just have a listing of all the different vertices. Maybe I want to know how many vertices there are. We say vcount(flights). There's 20 vertices, or the number of edges, and here's what's really important in R, all the functions that are very sensitive, you need to use the correct exact names, in our case it's not ecounts, it's ecount. And we see there's 192 edges. So far this has been a directed graph. You can also work with undirected graphs. We'll make a new object which takes this directed graph and makes it into an undirected graph, and we can see how it acts a little differently. So we will say flights2 gets as.undirected, and I will use autocomplete to finish that, of flights. We can now print out the graph, we'll say flights2, full=TRUE, and see how it's no longer printing to Boston, but it's Palm Springs and Boston as reciprocal. If we plot it, we can see the graph no longer has arrows. Doesn't matter now, not all graphs are gonna be directed and not all graphs will be undirected. We can look at this new undirected graph and check out it's edges by saying e of flights2, and notice how it just goes either direction. We can also do ecount(flights2), we now have 98 edges, that's because before, going from Austin to Portland was different than going from Portland to Austin. There were two edges, but now Austin to Portland is just one edge. If we check out the number of vertices, that shouldn't change, because still that's the number of vertices, we just changed the number of edges. Often time when booking flights, you're worried about how long it's gonna take. So you want it to be as fast as possible usually. So it would be great if we could see this graph weighted by the time it takes between places. So we will say plot of flights, layout equals layout.fruchterman.reingold, and edge.width is gonna be equal to the edge of the flights, the time variable, and we'll divide it by 100, just to give it a bit of scale. Our plot might become a little bit harder to read, but it has these weights so it sort of gives you a sense of how long it takes to go places. In this case, the thinner the line the better, because that means it takes less time. This will become important when you try to route between cities and there's no direct flights available but there's different stop flights, you have to pick out the best combination of flights that get you there as quickly as possible. As we saw for simpler graphs, there are a number of different layouts we can use. For example, we have the kamada.kawai layout. And again it's just different ways of looking at the graph. You won't go through all of them because we've seen them before, but it's very useful to see. Loading up graph data in R is pretty easy. You start with an edge list, read it as a CSV, and just put it to a graph. You can then go through and plot it in different ways, have it directed or undirected, get counts on the vertices and counts on the edges to find out all this basic information about your graph.