A side project I am working on (more on that later if I can manage to blog more than once a year) has me needing to interact with several datasets that are broken down by census tract. This means that, among other things, I’m going to be outputting a lot of Tacoma and Pierce county map visualizations, and need to be able to outline and mark these tracts in a variety of ways.
So! Here is a rundown of exactly how I am pulling the census shapes, and squooshing them into ggplot2 in R. First, we’re going to need a library or two. Or six. Probably six.
Tigris provides the census shapefile data. Graphics come from ggplot2 and ggmap. Broom and maptools help reshape the data into a friendly data.frame. And bit64 keeps the long census IDs from breaking a normal integer field.
library(tigris) library(broom) library(ggplot2) library(ggmap) library(maptools) library(bit64)
Next up, we need to retrieve the spatial data for all of the census tracts in the county. In addition to giving us the regions to draw later, it will help us define the limits of the base map. Tigris needs identifiers for the state (it would prefer the code, but it will take the abbreviation) and the county.
county_cols = c("state", "statefp", "countyfp", "county", "classfp") wa_counties <- read.csv("https://www2.census.gov/geo/docs/reference/codes/files/st53_wa_cou.txt", header=FALSE, col.names=county_cols) pierce_code <- wa_counties[wa_counties$county == "Pierce County", "countyfp"] pc_tracts <- tracts(state="WA", county=pierce_code)
If we were drawing a fancy map with leaflet, this shapefile would be great. But ggplot2 (and other ease of merging tasks) would prefer a nice and tidy data.frame.
pc_tidy <- tidy(pc_tracts, region="GEOID") head(pc_tidy)
## long lat order hole piece group id ## 1 -122.4420 47.28712 1 FALSE 1 53053060200.1 53053060200 ## 2 -122.4323 47.29602 2 FALSE 1 53053060200.1 53053060200 ## 3 -122.4327 47.29623 3 FALSE 1 53053060200.1 53053060200 ## 4 -122.4363 47.29842 4 FALSE 1 53053060200.1 53053060200 ## 5 -122.4366 47.29883 5 FALSE 1 53053060200.1 53053060200 ## 6 -122.4366 47.29898 6 FALSE 1 53053060200.1 53053060200
Now we have a whole bunch of data. Time to start plotting! In particular, note the group field, which separates out each individual polygon. I missed this step the first time around, and it gave me some very interesting maps, as it tried to draw the whole thing with one continuous line.
ggplot() + geom_polygon(data=pc_tidy, aes(x=long, y=lat, group=group), color="black", fill="white")
Ta da! It’s Pierce County! All done.
Okay. Well. Some folks may have an ingrained knowledge of the county’s geographic ins and outs, and be happy with just the outlines. Others might like to see some identifying information, some sign of city names and bodies of water and such. For that, we’ll use ggmap’s get_map function to pull our backdrop from a remote tileserver. There are a few options here. I went with a Stamen map (http://maps.stamen.com/), because it was easy to use, and visually unobstrusive.
In order to pull the map, first we’ll need a bounding box, specifying the furthest south, west, north and east (in that order) we’d like to go. For this, we can just use the minimum and maximum longitudes and latitudes from our set of points.
bounds <- c(min(pc_tidy$long), min(pc_tidy$lat), max(pc_tidy$long), max(pc_tidy$lat))
Now we’ve got enough info to get the map, and draw it, too.
pc_map <- get_map(location=bounds, maptype="toner-lite", source="stamen") ggmap(pc_map)
Drawing the census edges is as simple as adding our geom_polygon back in. We’ll also render the polygons transparent, so they don’t cover up our helpful geography, and add in some color so they stand out.
ggmap(pc_map) + geom_polygon(data=pc_tidy, aes(x=long, y=lat, group=group), color="red", alpha=0, size=0.3)
Pierce County! With lines around its bits! Of course ggplot2 provides all manner of prettification opportunities, but that’s not really the goal of the day, so I’ll leave things there. Next time: less code! More maps! Actual data!