Drawing Pierce County: Census Maps in ggplot2

A side project I am working on (more on that later if I can manage to blog more than once a year) has me needing to interact with several datasets that are broken down by census tract. This means that, among other things, I’m going to be outputting a lot of Tacoma and Pierce county map visualizations, and need to be able to outline and mark these tracts in a variety of ways.

So! Here is a rundown of exactly how I am pulling the census shapes, and squooshing them into ggplot2 in R. First, we’re going to need a library or two. Or six. Probably six. Continue reading

Government Data, Big and Small

Datagov_logo.jpgThe Obama administration has been pretty big for data science. They appointed the country’s first official Chief Data Scientist, and launched data.gov, a clearinghouse for government generated public data sets. President Obama also signed an executive order declaring open and machine-readable formats as the new default for all forthcoming government information resources, guaranteeing the continuing availability of interesting data.

Beyond the practical use of generating actionable conclusions from public data, any repository of up-to-date, varied data sets is invaluable to beginners, looking to hone their skills, and build a portfolio of projects.

But self-directed amateur research can also present a significant roadblock. Exploratory analysis—essentially poking around in the data and learning interesting things—is important, but the real skill of an employable data scientist is the ability to frame and answer a well-defined question. Continue reading