Parking, Personal Information, & Privacy at Point Ruston

A few months ago, payment machines went up at Point Ruston (a popular new mixed-use development here in Tacoma, for the out-of-town readers), and it became clear that the formerly free parking would soon become less so. Kate Martin at The News Tribune got the lowdown on the forthcoming setup. Plenty of folks grumbled, as they’d gotten used to having parking for the nice waterfront family area, regardless of whether they were shopping.

This is not, for the moment, my concern.

What stands out to me is exactly how payment and validation will be accomplished. In most garages either you pay by the slot number, pay for a ticket to put on your dashboard, or get a ticket and then pay on exit. Here’s how it works at Point Ruston:

  • A photograph of your license plate will be taken on entry and exit, and computers will read the information from the picture.
  • If you stay more than an hour, parking will cost you.
  • If you want to pay at the machine, punch in your license plate number at the machine, and pay for your time.
  • If you want to get validated, take your license plate number (or a picture of it) to a business, and they will enter it in the system, and it will give you some free time.
  • Validation doesn’t stack, but it does reset with each new entry.
  • If you don’t pay, they can look you up in the state’s license plate registry and bill you after the fact.

On first glance, it does feel a little creepy that they are photographing your license plate, but is it really all that different than other methods? I would say yes, and I would say it in several ways.

(Caveat for the rest of this post: I am not a lawyer or legal expert of any sort. I am a guy who knows very well how data can be used to paint a picture of behavior, with some experience reading privacy policies. Take that for what it’s worth.)

Your License Plate is Personal Information

If you push a button for a ticket at the entrance, and pay in cash at exit, there is an electronic record that SOMEONE parked for that amount of time, but not who. That’s why they put the gate up until you pay: they aren’t set up to know who you are if you try to skip out. They can go back and review security footage if you crash through the barrier, but they haven’t fundamentally associated your transaction with your identity.

But who pays cash these days? Can’t they track you by your credit card number? Well, yes and no. There are a LOT of legal hurdles to the processing of digital transactions, and restrictions on the use of the data. Businesses are in much better shape tracking your behavior if they have another method. This is what rewards cards at retail stores are for—they give you a little bit of free stuff, and you give them your name, phone number, and a full transaction history without the legal baggage of payment info.

But now, instead of just your payment info and a ticket, the garage is explicitly tracking something that can be uniquely associated, if not exactly to you, then at least to your household. Whether or not they know exactly who you are, they do know the overall usage pattern of that car.

Okay, but how much does that really tell them about me?

There are legal restrictions on license plate lookups in Washington state. The Department of Licensing has a Contracted Plate Search system that qualified businesses (those with a valid business need to get that info) can access to associate license plates with registered vehicle owners.

Republic Parking (who operate this garage) almost certainly have access to the system. After all, they need to be able to bill or ticket delinquent drivers, and manage towing of abandoned vehicles. However, since the system is intended for these purposes, and not to build a customer database, they would be on squishy ground if they tried to use one for the other, and I suspect that the DOL wouldn’t be happy about that, so let’s be nice and assume that Republic doesn’t look up your identity except for entirely kosher reasons.

Now let your mind linger for a while on what I said above, about businesses avoiding legal problems with payment info by getting you to hand over information all on your own. We’ll come back to that.

So what else is this data for?

Like any good business that collects data, and processes transactions, Republic Parking has a Privacy Policy, and a Terms & Conditions document. It’s generally a good idea to read both, to get a clear(ish) picture of what’s going on with data. I’ll be focusing on the Privacy Policy here, but it’s interesting to note that this doesn’t list “license plate” as Personal Information, covered by the rest of the policy. However, the T&C does call out license plate as Personal Information, so taken together, the bases are covered.

Privacy policies are full of legalese, both to cover all possible asses and to discourage laypeople from bothering to read them, but here are a couple highlights, on three of the most important privacy aspects of any data-driven system: data use (what else are we using this for?), data sharing (who else will we give this to?) and data retention (how long will we keep this?).

  • “Republic Parking will only use the information collected for the purposes specified during the collection process identified in this Policy or as authorized by law.”
  • “Republic Parking will only use the Personal Information as specified during the collection process to conduct the related business transaction, for other reasonable business purposes, or as authorized by law.”
  • “Republic Parking may disclose your Personal Information to third parties that are directly involved with the processing or storage of that Personal Information, for specified business purposes such as payment processing, unless otherwise prescribed by law.”
  • “Republic Parking only retains Personal Information for as long as reasonably needed for business purposes and as authorized by law.”

There is a phrase you’ll see a few times, in a few forms: “reasonable business purposes.” To me, reasonable business purposes means “things we haven’t thought of yet, or things that make sense for our business but that we don’t want you to immediately think of when you give us your info.” This is universal ass-covering: they will use your data for whatever they think is good for business, and keep it for as long as it is useful, while still being able to tell the Trib things like “Data collected by the system is maintained in Republic Parking’s system, and old data is eventually overwritten. It is not sold to a third party.”

So they know when I park. So what?

It’s not like a parking garage is going to market parking to you, right? This doesn’t tell them a ton, beyond general time frames. To really track your behavior, they’d need to know what individual businesses you visited, and it’s not like every business will know your license pla…

fullsizeoutput_107e

…oh.

Want to park for free? Get validated at a business! But this isn’t just a stamp on your ticket: each business in Point Ruston is equipped with a terminal connected to the Republic Parking system. Hand them your license plate number, they punch it in, and your account is automatically credited with the appropriate number of hours. And knows what business you are at.

Need more time? Validate again, and it will reset the clock. You are strongly encouraged to log your license plate at every single business, so that you are always covered. And Republic always knows where you went.

Which brings us to data sharing. From a a marketing standpoint, all the businesses would love to know what other businesses you visit. Ice cream and coffee shops would love to know that you go see all the Marvel movies on opening day, for example, so they know when to send you a coupon. But if Republic is collecting this info for their own business, can they really share it around?

“Republic Parking may disclose your Personal Information to third parties that are directly involved with the processing or storage of that Personal Information, for specified business purposes such as payment processing.”

Thing one: every business with a validation terminal is directly involved with the processing of that Personal Information.

Thing two: payment processing is an example of specified business purposes, not a comprehensive list.

And finally, there is this: at least one of those businesses (maybe more, I haven’t been to them all) is specifically equipped to match customer identities to individual visits and purchases, without using government lookup systems or payment information. Cinemark would be just thrilled if you’d like to join Connections (to collect rewards points for later use) or Movie Club (for sweet beverage discounts and a single discounted monthly movie pass), and accept their privacy policy that allows them to share your information via joint marketing agreements.

I did promise we’d get back around to using customer rewards cards to avoid legal concerns, right?

Once an agency with an interest in data collection knows your name, email address and phone number, the sky’s the limit. They know anything that’s public about you on Facebook, or any other source, and can purchase large data sets full of valuable marketing data, associated with those details.

Things I know, and things I do not know

Here are several things I can’t tell you with any amount of certainty:

  • That Republic will use your license plate details for anything but the parking transaction.
  • That Republic, Point Ruston and the businesses therein have any kind of data sharing agreement that lets them pool validation and other information to identify you and map your behavior.
  • That Cinemark has any joint marketing agreement with their Point Ruston neighbors.
  • That you will eventually start receiving targeted marketing from Point Ruston based on your shopping and date night history.

These are all speculative, and I will leave it to proper journalists who know what they’re doing to suss out any details that are available to be sussed out. Here are a few things I can tell you, though:

  • From a technological standpoint, they absolutely can do all of the above, with the systems we know to be in place.
  • Their privacy policy seems, to a data-but-not-legal expert, broad enough in all the right places to permit all of the above, should they choose to.

And then everything might change.

Last excerpt from a privacy policy, I promise:

“Should Republic Parking require your information for purposes other than those specified during collection or as permitted by this Policy, we will seek your consent where required, or a notification will be provided to you prior to the new use.”

You probably have Facebook. You probably agreed to their privacy policy without reading it. You probably get notices from time to time that they’ve updated their policy, and you don’t read those. Every once in a while, you’ll see a kerfuffle online because somebody noticed a change that bothers them. It might bother you. And then you probably ignore the next change, too.

A privacy policy that is subject to change means that even if you read the current version, and are happy with it, if they ever come up with a new use of your data, and are concerned they aren’t covered, they can amend the policy, and be mostly confident that you won’t be paying attention. So the big concern is this: even if they aren’t doing all these things, even if a lawyer comes along and says I’m wrong, and the language of their policies is sufficient to restrict their behavior, the infrastructure is there, and the policies can change. They are now equipped to identify you, track your behavior from business to business, and send you targeted marketing. Whether they actually do so or not is likely just a matter of time.

Advertisements

On Content Moderation and Human Eyeballs

In the weeks and months following the Charlottesville terror attack, among the after-effects has been rising pressure on online services to police hate speech and violent rhetoric on their sites. More qualified folks than I will have sifted (and continue to sift) through the moral quandaries of balancing a desire to stifle hate with a vision of an open and free internet, but I’d like to dig a little into the technical considerations of automated content moderation.

There’s a vision of machine learning as a kind of technological magic. Powerful computers and complicated algorithms sift through large swaths of data, and learn to recommend a movie, or identify your 2nd cousin in a photograph. Even the more tech-savvy users, who would never actually use the word magic, still consider this the work of a dispassionate, anonymous machine.

But there are a lot more human eyeballs involved than you might think. Most computers “learn” using a simple paradigm: look at a large swath of data, yes, but a large swath of data where someone also provides the answers. You don’t show a computer ten million pictures and ask it to tell you where the dog is in picture ten million and one. You show it ten million pictures and point at every dog. The aforementioned algorithm says, “Joe says this right here is a dog, and so is this, and so is this, and so is this. What mathematical set of features are common between them?”

To teach a computer to do a thing, a human needs to have already done that thing, many many times. Some use cases are fortunate to wield pre-existing data—methods of detecting positivity and negativity in language, for example, can look at movie reviews, with a helpful star rating or thumb label. Others rely on users—every thumbs up or down you give Netflix helps their systems build a picture of what B someone will like, if they also liked A.

Content moderation is not so simple. The end goal for companies is the automatic detection of terms-of-service violating language. So, how to teach this algorithm? Past data? We could leverage the content of known hate-group forums and news sites. But these forums are likely to have a large amount of off-topic, benign content as well. User labeling? Social media platforms have buttons to report or flag inappropriate content, but these are vulnerable to concentrated abuse, or the whims of a given user’s morality.

At some point, websites need eyeballs. And they have them, armies of content moderating “contractors,” performing the menial labor of the data economy: getting paid 4¢ a post to say “this is fine, this is fine, this is a violation, this is fine”—and if you think that job sounds like fun, you may not have it quite right.

There are (at least) three implications worth considering here, before we decide this is the way to go. (1) These people are in a hurry, (under)paid by the click, and clawing their way toward minimum wage. (2) These people are performing a variety of different tasks in rapid succession. One site might pay them to moderate 1000 flagged posts, the next might be that guy above, asking them to tell his computer where the dog is in 1000 pictures. (3) These people are, well, people. Even the most precise terms of service leave some wiggle room for human judgement; and most are instead intentionally vague, to allow for a lot of wiggle in either direction. Everyone’s got some bias in their tank, and the line for what is racist and what is not can vary drastically.

This is not a recipe for reasoned, consistent decisions. This is not “here is definitely a dog.” And it’s remarkable how easily even the most sophisticated technology can learn subtle bias, if the examples it learns from are biased in the first place. All it takes is a few people who keep pointing at the cat and insisting that it’s a dog. “Garbage in, garbage out,” is as true in machine learning as it is in cooking.

Does this mean that humans can’t do the job of teaching machines to detect hate? No. But it’s essential that the companies relying on these methods think long and hard about how to make their labeling process as effective as possible. Maybe pay the workers a little more to encourage a slower, more thoughtful response. Maybe recruit a demographically diverse group of workers to stymie bias in any direction. Most importantly, just think. Think as hard about the process you use to train your model as you’d think before handing any other critical decision over to an algorithm.

Recommended Listening:

In addition to all the links above, NPR’s Note to Self has a fascinating interview with a contract content moderator on the ins and outs of the job.

A Brief Introduction to Food Deserts

As a still-learning data scientist (as if there were any other kind), I’m always on the lookout for interesting data to fiddle with, in my copious free time. Recently, my fiddling has turned toward the USDA’s Food Access Research Atlas, a resource for the study of food deserts in the US. In the coming months (here’s hopin’), I’ll be posting some of the work I’ve done, but first it’s worth diving into the concept a little.

Continue reading

Drawing Pierce County: Census Maps in ggplot2

A side project I am working on (more on that later if I can manage to blog more than once a year) has me needing to interact with several datasets that are broken down by census tract. This means that, among other things, I’m going to be outputting a lot of Tacoma and Pierce county map visualizations, and need to be able to outline and mark these tracts in a variety of ways.

So! Here is a rundown of exactly how I am pulling the census shapes, and squooshing them into ggplot2 in R. First, we’re going to need a library or two. Or six. Probably six. Continue reading

Weird Data: Addendum

At the end of What’s Weird About This Data, and Why?, I mentioned that the knowledge gained by analyzing one outlier could tell us where to look for others. When I first looked at the big graph, growth looked vaguely exponential. But knowing that 2004 saw a whole new business type (Individual Landlords) added, things start to seem a bit more linear in the before and after. And if that’s the case then 2015 looks like another unusual spike.

businessesSince1990.2015

I’d originally guessed that maybe the most recent full year would always be the biggest because it includes businesses that might fail to materialize, and let their license expire, but since I knew from 2004 that a new business type could lead to unusual behavior, I did a similar summary of the NAICS code descriptions (again, only showing the first few here).

Code Description Count
Taxi Service 1019
Lessors of Residential Buildings 526
Lessors of Nonresidential Buildings 86
Residential Remodelers 86

Hoo boy, that is a lot of taxi services. If you’ve been paying attention, both to me and to the state of personal transport trends, you can probably guess the story: Tacoma requires all Uber drivers to have individual business licenses, just like traditional cab drivers.

It seems reasonable to expect this kind of thing will show up more and more, as these middleman services like this enable individuals to easily sign-up as independent contractors in a variety of industries.

What’s Weird About This Data, and Why?

Exploratory data analysis is the “real world observation” of data science. In proper rigorous statistical analysis, you are supposed to have a hypothesis before you test your data, but that’s based on the idea that you’ve already observed some phenomena in the real world, and developed a hypothesis based on that. With data science, the data set and the “real world” are sometimes the same—the data you’ll eventually analyze is the thing you also have to observe to come up with questions in the first place.

So you do some exploratory analysis. Make a few simple visualizations, summarize the data, poke around, looking for fun facts. Often you learn things that wouldn’t have occurred to you without exploration, and the information gleaned can help you avoid mistakes moving forward.

This particular example was supposed to just be a self-driven exercise to practice a few core functions for basic recoding and visualizations in R. I fished around on the data.cityoftacoma.org portal, and downloaded a data set of all open business licenses issued by the city. Continue reading

Rattling My Headbones

Apple-Podcasts-app-icon

I lived in Spokane for a couple years, and I spent a lot of time biking on distraction-free trails. During that time I marathoned every available episode of Radiolab, and since then I’ve become a bit of a podcast junkie. I don’t have much time to watch TV or read recreationally these days, so instead I listen to podcasts whenever I don’t need my brain for something—on walks, doing the dishes, etc.

Diving headfirst into data science meant, among other things, adding a few new podcasts to the list. So here’s what I’ve been listening to. (I promise there will be some technical stuff one of these days.)

Continue reading

Government Data, Big and Small

Datagov_logo.jpgThe Obama administration has been pretty big for data science. They appointed the country’s first official Chief Data Scientist, and launched data.gov, a clearinghouse for government generated public data sets. President Obama also signed an executive order declaring open and machine-readable formats as the new default for all forthcoming government information resources, guaranteeing the continuing availability of interesting data.

Beyond the practical use of generating actionable conclusions from public data, any repository of up-to-date, varied data sets is invaluable to beginners, looking to hone their skills, and build a portfolio of projects.

But self-directed amateur research can also present a significant roadblock. Exploratory analysis—essentially poking around in the data and learning interesting things—is important, but the real skill of an employable data scientist is the ability to frame and answer a well-defined question. Continue reading

[obligatory welcome post]

Oh, hi.

Oh, hi.

I’m Joe. I’m the Senior Systems Administrator at a small web design company, in Tacoma, WA, which basically means I live torn between keeping the servers that host our websites running, and finding the time to make them better. I also run the databases, and try to be the resident expert whenever complex queries are needed to extract useful reports.

I’m also a newly minted graduate student in the Masters of Information & Data Science (MIDS) program at UC Berkeley. I’ll get more into broader questions of what exactly data science “is” later, but basically it means I’ll be spending the next 20 (19 now I suppose, got a late start on the blog) months learn how to find interesting data, learn from it, analyze it, and help people make good decisions based on it.

Along the way, I know I will need more practice than just the coursework will give me, and that’s what this blog is for. I want a place gather my thoughts as a learn, and a place to talk about side projects or little problems I’m working on.

I’ll say right here that I’m not an expert. I don’t have a rich background in data analytics, like some of my classmates. I intend to be an expert, but for now these are the ramblings of a learning amateur. So if you are a data science professional looking for sophisticated analyses, or advanced tips and tricks… well, come back in a couple years.

So what will be here? Mostly I’ll try to keep things short (full time job + school + toddler = pretty full days): code snippets, quick thoughts on this or that article, concepts I want to write down to make sure I understand them, datasets I’ve found that might be worth further investigation. Once I get better with ggplot2, interesting visualizations I’ve made. Occasionally, I’ll try to do some longer case studies of data explorations and side projects.

I’ll probably start talking about baseball statistics a lot. You can ignore that if you think it’s boring. I’ve also had a long-standing, unconsummated interest in digital signal processing, so I’m sure there will be some of that once I get into time series analysis, and a little fun with data sonification.

This and that. Things and stuff. Big data, little data, what begins with data? This blog, that’s what. Enjoy!