A side project I am working on (more on that later if I can manage to blog more than once a year) has me needing to interact with several datasets that are broken down by census tract. This means that, among other things, I’m going to be outputting a lot of Tacoma and Pierce county map visualizations, and need to be able to outline and mark these tracts in a variety of ways.
The Obama administration has been pretty big for data science. They appointed the country’s first official Chief Data Scientist, and launched data.gov, a clearinghouse for government generated public data sets. President Obama also signed an executive order declaring open and machine-readable formats as the new default for all forthcoming government information resources, guaranteeing the continuing availability of interesting data.
Beyond the practical use of generating actionable conclusions from public data, any repository of up-to-date, varied data sets is invaluable to beginners, looking to hone their skills, and build a portfolio of projects.
But self-directed amateur research can also present a significant roadblock. Exploratory analysis—essentially poking around in the data and learning interesting things—is important, but the real skill of an employable data scientist is the ability to frame and answer a well-defined question. Continue reading