Matt Cutts of Google responded, informally to the Privacy International report, citing the privacy lapses of AOL and other companies. I'm not going to address these, particularly since Donna Bogatin has done this, so well.
Some of Matt's readers assumed PI was motivated by some form of corruption or corporate bias, or that the report was an attempt to garner attention. I doubt the corruption, but it's true that watch dog groups operate almost purely on the attention they can generate about an issue.
Regardless of motivation, none of this discounts what the report says, or how people are reacting to it. That's what should be the focus of our discussion now, and that's what the folks of Google should be paying attention to. I wrote the following in comments at Matt's post:
Try this on for size:
Start from the premise that perhaps this organization’s concerns are a legitimate reflection of how people are going to perceive Google in the years to come. And then think about how this could be a way of ‘kicking’ Google out of its complacent dependence on the goodness of its search for the ultimate algorithms, by reminding those in charge that the internet is more than a set of calculations.
This is an opportunity for Google. Unless you see it this way, get used to having your spleen agitated on a regular basis.
What's not needed is more PR for Google, or some form of privacy czar. The former will just recommend more paint for the cracked walls; the latter will get caught up in the mechanics of what is Google–which is an organization centering around the push for perfect searches, perfection being determined in a scholarly vacuum.
I've never been particularly concerned about Google releasing data outside of the company, or to the government. Well, at least not the US. It's not in the company's interest, from a corporate stand point, or from the personal views of the workers.
What I've long been concerned about is what Google does with this data internally. You see, I have a difficult time trying to figure out why having IP addresses associated with specific searches is such an important component of the algorithms for these searches. After all, one doesn't need to know the exact IP address to see which items are clicked through to from searches. One assumes that when creating a generalized search, click through rates would be sufficient.
But no, the company specifically stores IP addresses with these searches. I'm not a search engine wiz, but one can assume the company is checking for patterns across a universe of searches. This gives it the ability to take search to the next levels of complexity–and the more information about how individuals approach search as a whole, the better it can fine tune algorithms. In fact, it would be nice if we would search under our names so it wouldn't have to muck around with IP addresses that can change. It will even provide us a history of such searches.
However, attempts along this line have not met with universal acceptance, which must be a source of frustration to the company. Why? Because the company really isn't 'evil', and doesn't have 'evil' intentions with its data. If it did, our lives would be so much simpler.
I assume the company is storing the IP address with the searches in order to generate a more semantically meaningful search result–exploring search as it relates to other searches; to perhaps even have the engine 'learn' from previous search efforts and adjust results accordingly. Not necessarily a bad thing to do, though people behave based on their unique environment, built of life experiences, that tends to blur the ability to derive any universally useful heuristic from captured patterns.
However, add this with the other data that Google can capture–either about a person, specifically, or about a given IP address at a time:
- Payments through Google Checkout, which provides valuable information about our buying patterns
- What weblogs we subscribe to, and, which items in those weblogs a person actually clicks through to. We can assume that 'click through' denotes a heightened level of interest
- Information stored or maintained through Google applications, such as documents, spreadsheets, software, email, or our calendars–that's a pool of potentially very personal, and therefore richly enticing, data in those applications
- Where we live, where we're going, how we're getting there, and with calendaring, why we're going
- What I read. What I read anywhere on the web.
- What I write. Also anywhere on the web.
- What groups I participate in, what usenet groups did I participate on in the past
- What videos do I watch, what images do I work on, which ones do I view
- Who are my friends? What clubs do I belong to? What political party am I member of?
- What are my financial investments? What companies am I most interested in right now?
I culled this list from the Google applications I know of, all reflecting the type of data that Google can, and most likely is, collecting about us. That's a lot of data. Why is it collecting such? For better searches? Not likely. In order to personalize the web? There could be something to this, and this is one area where our interpretations of Google's activities can differ, widely, between us.
Many people seem to feel 'flattered' or even grateful when software remembers about us. I'm not sure why–perhaps it has something to do with feeling alienated from this large world, or from those around us. Perhaps we're just lazy and anything that promises us simpler access to data is viewed as 'good'.
Google also projects this warm feeling of intimacy by the simplicity of the company's interfaces to many of the tools. They are not especially polished or sophisticated. They strike one as being efficient, simple, clean, and straight forward. There's never even a hint, at any time that Google is a multi-billion dollar corporation that's becoming one of the most major influences in both our culture and our lives. A company whose shareholders recently voted to continue doing business with China rather than take a stand against that country's repressive policies.
Yet Google gave us satellite views in our maps, and tools and toys we can use as much as we want without once charging us a penny. It gave us Developer Days, and GWT, and Maps; supports open source, and hires some of our favorite people. It is a teddy bear. A really big, really smart teddy bear.
Looking beyond the fur, though, we have to remember that Google is a company that can be both ruthless, and single minded in its determination of the course its charted for itself. For all that we may like those members of Google who we know–through weblogs, conferences, or other associations–they are only part of a much bigger whole. Their individual beliefs and personal morals can only influence the company as much as that inner, powerful sanctum that is the heart of Google allows. And the inner heart of Google is one based on a corporate belief in the ultimate righteousness of its algorithms. A belief that over time, as these algorithms are allowed to increase in sophistication, they will filter out bias, bigotry, and ignorance. The company believes passionately in its research–so much so that it can't even comprehend why we're so worried about privacy. What was it one person wrote in Matt's comments? People that aren’t us won’t get it..
Many at Google would most likely agree with me when I say the following: there is a purity of purpose behind such efforts at Google. I have no doubt that Google's efforts really are focused on finding the best results. I also have no doubt that Google sees such as being of benefit to the community. I might even agree with the necessity. Agree until…
Until the results of such are used to monetize who we are, and what we do on the web. To know exactly what ads we'd be most vulnerable to at any point in time. To mark who is potentially dangerous, and who is not. To determine what it thinks we really want to see when we come looking for information. Perhaps even to determine who is not worthy of being seen.
Tell me what job should I take, Google. Anyone remember that?
Information is power. I once wrote that Google is one of the most dangerous companies I know and was discounted for making a grandiose claim. Yet there is probably no entity in the world that knows more about us, that has more information on us, then Google. Not even the IRS knows that I like Firefly or that I vote Democrat. The state or a potential employer doesn't know that I'm searching for low cholesterol recipes. The Department of Homeland security isn't aware that I daydream about traveling, and plan secret little vacations that I can't afford. No government on the world is party to my fears, hopes, dreams, and worries as much as the search engines I use.
Some would say, correctly, that there is a simple solution: don't use Google products. True, I do switch search engines on a daily basis. But search engines are only the tip. What happens when Google starts tracking through ads? Through page readers? Through Google Analytics and Reader and Books and what all? Through the hundred other little areas that we look at with such fond indulgence because they, you know, have cool APIs?
Keeping our data in raw form for up to two years? Why so long? The reasons given make no sense. They never did. How much data is gathered, and will be gathered with new acquisitions, is also unknown. Google has bought a lot of companies, and is associated with still others. We really don't have an idea, at all, how much information is being tracked to us through cookies and IP addresses. We also don't know who in the company has access to it, and how much the data is directly connected to individuals.
Google wants to know all about us, but isn't willing to let us know enough about it so that we can make rational assessments of our privacy risks. When, in ignorance of such information, we write based on conjecture, it pooh poohs our fears, and discounts our worries and repeats that it is 'better than other companies'–equivalent to we in the US saying our form of torture isn't as bad as that practiced in other countries, and at least our methods don't leave scars.
Most worrisome of all gaps in our knowledge of Google operations is what profiles are generated from the data that Google collects, and exactly how long will such derived information be stored? What was it the folks at Google said once? It wants to eventually store the web? If so, then space is not a concern. I imagine much of me can be compressed into a space less than a Gig in size.
Regardless of your opinion of the Privacy International report, isn't it about time Google realized that not everyone shares the same faith in the company'spurity of purpose; nor the same belief in the inherent neutrality and fairness of algorithms? Two years. What was I searching for two years ago–I can't remember now, but Google can. Two years. That's longer than my first marriage. Come to think of it, Google probably knows as much about me, or more, than my first husband. Considering my first husband, though, this isn't surprising and one of the many reasons I divorced him.
Unfortunately, I don't have the option to divorce Google.
