over 10 years on 2008-04-01


over the last few weeks several friends and a few journalists have pinged me regarding the evolving situation with youtube/viacom… at an extremely abstract level the case actually relates to the philosophy that undergirds drop.io, and as a result i have both watched it evolve with great interest, and hopefully been in a position to offer a unique perspective within the sea of discussion and commentary.

many smart and insightful people have been dissecting the blow-by-blow extensively, so rather than talking about the nitty-gritty, i have been approaching it from the opposite end of the spectrum and thinking about what the case illustrates about how data is owned and controlled in the digital sphere. specifically, i have been speaking about how the case relates to jeremy bentham’s panopticon as discussed in michel foucault’s “discipline & punish”, and how it is ultimately an expression of the concept of “power-knowledge” in terms of the modern information economy. this stuff is very scary and accelerating very fast, but nothing new.

as this mini-cycle in the news starts to go stale i thought it might be worthwhile to formalize on ‘paper’ some of my thoughts regarding the case and what it means. by my thinking there are four critical and related lesson to be abstracted from youtube/viacom.

1. for very clear structural economic reasons the rate of information collection is accelerating very quickly.

2. it is basically impossible for users/consumers to know what information is being collected by which service providers.

3. it is impossible to control how information will be used once it is collected (pools of “power-knowledge” diffuse and get appropriated by third parties)

4. the internet is fast becoming true expression of jeremy bentham’s panopticon, - to place a value judgment – this is quite frightening.

the youtube/viacom case is not the first to illustrate these points, and it will not be even close to the last, but it does serve as a valuable and highly current lens thought which to discuss these issues, and their increasingly practical relevance.

#1: to begin, information is being collected and stored almost everywhere by almost everyone at an accelerating pace as a simple function of economics of information. almost all information is now profitable. as the cost of collecting, storing, and using information continues to decline and the value that can be extracted from its use continues to rise, more gets collected. the cost side of the equation is easy enough to understand, it is simply a derivative of moore’s law. unless we hit some catastrophic energy wall, the reality is it will continue to get less expensive to collect, store, and analyze (bandwidth + processor + disk).

the ‘value’ side of the equation is a little bit murkier, but still easy enough to follow. data is generally useful across a whole range of activities. for almost all companies (web and otherwise) the more data you have the more you can streamline your processes to both lower your cost structure and grow revenue. data is the lifeblood of everything from figuring out how to architect your processes to be maximally efficient, to how to attract new customers, and how to make your new customers more profitable. all of these ‘values’ for data comes before the concept that data is, in and of itself, increasingly considered an ‘asset’ which has an assignable value on the open market (the ‘asset’ value of data is actually quite suspect in the long term, but the other values are clear). the upshot is that everything is being collected by everyone at as fast a rate as possible.

this may actually be the ‘shocker’ on youtube/viacom to the general public if there was one at all… unlike search engines, which everyone expected to be holding tons of highly specific data, i think a lot of ‘non-industry’ users didn’t realize that every video they watch is being catalogued and stored forever, regardless of whether or not they were youtube members.

#2: with this understanding, the real issue becomes disclosure of exactly what is being collected. people do assign value to their data (as well they should) which means that companies should be forced to make tradeoffs between the value of extracting data from customers and the costs they are directly levying on their customers in so doing. this is exactly how the scenario plays out when individuals are asked to volunteer information about themselves in the form of an online ‘registration’ in return for service (what is your name, age, gender, etc). but, when it comes to the vast sea of use/interaction data that is being catalogued, companies generally don’t have to make those calls because it is impossible for their customers to know what is being collected. use data which in aggregate can map back to individuals, is collected at no immediate direct cost to the user.

so, without the tradeoff of directly imposing on users to collect data, services are basically collecting ‘free money’. a company may or may not incur an initial tiny cost in the form of user annoyance by asking for a name and email address, but that cost can be amortized over the vast iceberg of use data against which that requested information is mapped comes at almost no cost. consumer information is critically valuable to these companies but it costs almost nothing because users generally don’t know what is being collected (future blog posts on how users should be extracting value here).

this can’t be regulated. it is impossible to mandate that information not be collected because since the internet is fully de-centralized and there is no way to know from the user end whether information is being logged or not, there is no way to regulate. sure, you could set up a system of audits, laws, and verification schema, but practically speaking it would never work (for a whole host of fun reasons to discuss some other time).

again, youtube/viacom is interesting on this axis only because people had no idea what information was being collected and what information youtube even had to disclose with the viacom case. it isn’t simply a matter of popping open a file on your desktop to know what they have (as it was with the ‘cookie scare’ of the late 1990s)… very sophisticated people had a very very hard time following exactly what ‘information’ youtube had collected and what they were being asked to disclose in conjunction with the case.

#3 none of this is meant to say that the people collecting the information intend to use it for malicious purposes or in nefarious ways (though there is plenty of that)… it just means that there exists more and more information collected into databases which represent raw agnostic ‘pools’ of power. i would personally highly doubt that youtube anticipated that they would be forced to expose the user data they collected and saved from their customers when they made the early decisions to architect their system to include collecting the information in the first place. the system architects probably were just hoping to use the data to provide a higher service level at a lower cost to youtube’s users… but once information is collected and codified it is at risk of exposure. once youtube decided to collect information from their users they were fundamentally putting their users at risk.

in abstract terms, information is simply power… and large pools of ‘power’ are enticing targets for exploit in almost any form imaginable. there are the forms of exploit with which we are all highly familiar, abuse by those who legally own the information (companies selling their ‘lists’), compromise by ‘hackers’, etc. but there are many other forms of potential data transfer and/or appropriation by third parties. governments may make demands on the pools of data (see google/china), third party companies may attempt to gain access to the data (see youtube/viacom), etc.

point three is thus relatively simple but utterly critical. when you collect a lot of data you are creating something which is of extreme value to a whole range of people in the present and future. when things of extreme value exist all sorts of actors will try to get access and appropriate a portion of that value. users should be aware that even when they are dealing with very good companies and services there is a high risk that someone, in some form, will come knocking and the built up knowledge-power in the service’s stores may be compromised, today, tomorrow, or infinitely into the future.

your information will ultimately not only go to the highest bidder, but to all bidders who are willing to pay a positive fully baked price. the youtube/viacom case is a perfect example of this. it is just business, nothing personal. youtube collected an enormous amount of data, and viacom has a use for it that is sufficiently valuable to them (by their estimation) that it is worth the fully baked cost of getting the data (legal fees, public opinion, etc.).

#4 so, where we sit is basically that if you use the internet you should fully expect that your data will be collected without your full knowledge or understanding, and it is likely that that information will fall into the hands of those who find the data most valuable, other companies, governments, hackers, you name it. to be blunt, you need to assume you are being watched. this sounds very science fiction and highly alarmist, it is a simple function of economics.

should anyone care? maybe… many people, especially in comfortable western democracies, would say they have nothing to hide so there is no harm in everyone knowing what they are doing. this is one approach, and it has its merits.

that said, i think it is important to remember jeremy bentham’s ‘panopticon’. bentham’s panopticon was a theoretical piece of prison design (technology) where there is a single watchtower in the middle of a circle of cells which have bars on both sides. a guard could sit in the watchtower and, with just natural sunlight (this was well before video cameras), see all the prisoners activities all the time. realistically, any given prisoner may or may not be watched at any given time, but the prisoners know that they could be watched without their knowledge and therefore they act as though they are always being watched. the panopticon is powerful in that the prisoners internalize that they are being watched and change their action calculus accordingly. it is the internal manifestation of the power created by the panopticon which functions to change the behavior of the prisoners.

the internet is basically a more advanced real piece of technology that has the same result. you might be being watched and you might not be being watched, but you have to assume that you are always being watched (or at least apply a very high percent chance that you are being watched into your action calculus). i think people would be hard pressed to honestly say that they would always act the same way regardless as to whether or not they are in ‘public’…

what about batman? these themes and ideas are cropping up everywhere in modern pop culture. the most recent batman (which was excellent) deals with a lot of these questions and issues, most blatantly in the debate over a super powerful mass cellphone tapping system that the bat constructs to monitor gotham (which looks one heck of a lot like a panopticon without the central feedback loop to me). in the movie lucien threatens to resign over the construction of the machine on ‘moral’ grounds… i think morality has little to do with it and would make a very straight-laced economic argument against such a machine (more on that some other time)… the point is that these themes are everywhere, ranging from movies to a heated debate that broke out on friendfeed the other day when robert scoble suggested that he would ideally like to outlaw anonymity online (i couldn’t possibly disagree more).

ultimately this youtube/viacom case is just an ah-ha moment for a lot of people that even if they don’t sign into accounts, their video watching behavior is still being tracked and can still be exposed… so maybe you won’t watch the ‘follow me’ video 1800 times in one day anymore, or maybe you will… but either way you have to assume that at some point someone else will know – how you react to that is up to you.

i don’t have answer to any of this – i struggle with it a lot. what i do see is that there is a new movement gaining traction online which suggests that services need to collect as little data as possible from users to fulfill their function, because ultimately the only way to keep information safe is to not collect it. i suspect that as consumers become more aware of the full risks to which their information and identity are exposed this movement will gain more traction. much like the movement towards "leave no trace/ zero impact" camping, we are moving towards a world where "zero-impact" informational exposure will be highly de online. but even this type of a movement might not be an ultimate answer and has its issues and does nothing to address the transparency of information collection (which is a critical chokepoint to the whole equation)

to allow myself even more parting leeway… from the limited reading i have done about brain science/memory, the act and process of ‘forgetting’ is actually critical to memory and the healthy functioning of the brain. societies either need to learn to forget or need to radically adopt to ‘total recall’.

a few final caveats: i feel i do need to leave with a few disclaimers on the above post. 1. it isn’t a tight academic paper in any way shape or form, and i know i have taken some liberties with the concept of “power-knowledge”, if someone wants to discuss this with me i would love to get a coffee 2. the beauty of the internet is nothing is set in stone and i reserve, as always, the right to keep thinking about and writing on this…. so, let’s call this a summation of my thought on youtube/viacom but only a conversation starter on some of the broader points.

direct link: http://drop.io/swl/asset/youtube-viacom
want to comment, use friendfeed: http://friendfeed.com/lessin

original swl blogposts and letters 2007-2010