almost 7 years on 2010-08-25

[LETTER] NATURAL [WORLD] PROCESSING VS. STRUCTURED LANGUAGE == GOOGLE VS. FACEBOOK AND WHAT IT MEANS FOR LOCATION


one of my favorite seinfeld episodes ever, kramer ends up being the 'movie phone' guy. people would call him for movie information and start typing on their phones the numbers of the movies they want to see -- kramer starts by trying to guess what they are pressing but then in one moment of frustration he says "why don't you just tell me the name of the movie you would like to see" - this is the difference between nwp (google) and swl (facebook)

~~~~~~~~~~

i think what people are really missing is that goog vs. fb hub-bub isn't about social vs. search / intent vs. identity, or any of the other battle lines people commonly reference in the popular press... it is really about divining meaning from what exists (natural language processing, natural location processing, etc.) vs. re-structuring language itself to generate more explicitly parseable meaning and value. google is a company which processes existing dirty data streams and wraps structure around what exists, facebook creates a structured template and asks people to fill in their template -- mad-libs style....

for the last 5+ years people have been claiming that "web 3.0" is all about semantics. for most of that period that was understood to mean better processing on the data we collect re: how things relate to each other -- it was told as an nlp story with companies that specialize in processing and algorithms first and foremost at the front. the problem is that nlp hasn't advanced anywhere nearly as quickly as people expected and is becoming clear is that there might be a far more effective way to get to semantics by re-writing the way people communicate/ language itself. so, if it is too hard to parse out nouns, verbs, etc, just spin up a structured database, write the indexes, let everyone fill in the structure, and query away at low cost to your heart's content -- think of it this way:

--- google is a "natural processing" company. their first and biggest success was a function of crawling a data-set of links and information that was already exposed on the web and calculating meaning out of it (processing page-rank on an already public and existing - if poorly understood web). this was a processing driven exercise. if you think about the vast majority of their efforts from there forward, the approach is always about mining existing data-sets to generate more value, or - as they put it - to 'organize' the world's information.
--- facebook is a structured-langauge company. their first success wasn't about calculating meaning out of existing data, it was about building an interface to capture data that implicitly existed in the world but wasn't explicitly mapped or structured anywhere (the relationship between people). this was an exercise in explicitly structuring the language of a "friend" relationship, data which didn't already exist. the vast majority of what they have done / are doing is about structuring nouns, verbs, and adjectives. they don't process heavily on what exists in the web, rather they focus on getting people to generate and structure information for them in the most useful form for their ends.

google uses existing data 'exhaust' and cleans it up to make it more useful, facebook just captures new data sets in the structure they need and then, ahem, "q"s it.

~~~~~~~~~~

so, what does this mean for everyone's new best friend, location:

google started the consumer location game with "latitude" which basically takes the approach of saying, your phone is already registering with towers all over the place throwing off location data, we can take that and process it into meaning. they are working organizing existing information which is under-leveraged. this isn't natural language processing in the human sense, but it is natural location processing of the cell-tower-strength and phone id. with enough processing and history they can back out who i am, where i am, and who i am with from the data... we speak/leave data trails, google tries to figure out what we are saying.

facebook is starting with a check-in vocabulary which takes the approach of structuring a language around location, and then asking people to actively contribute information into their framework. when using places i am structurally declaring: i [name] am at [location] with [people] -- there is nothing passive about it / they aren't wrapping structure around what exists or trying to guess, they are creating a platonic structure and getting people to change their language to match. they write the interface, and we change our calls to conform to their spec.

so, every time i "check in" to a place, i hear kramer saying, "why don't just tell me the name of the bar you are currently at"

~~~~~~~~~~

1. some strategic plusses and minuses of nwp vs. structured language: their are many on each side.

--- the nwp (natural world processing) approach is hugely powerful when either you have good data-sets which others can't replicate, or huge processing power others can't replicate - but doesn't work if the data-sets are open, and the computational power needed is too easily accessible... search has become a commodity largely because the data-sets are not proprietary and the processing power needed is now relatively cheap. i think this is a longterm problem for nwp companies, because you can't sustain your comparative advantage -- nwp also, quite frankly, hasn't been very successful recently -- natural language processing, for instance, is nowhere near where we expected it to be at this point.

--- the structured language approach doesn't require any pre-baked data set nor does it really require much processing, but it does require a ton of scale/use to become powerful. facebook accomplishes this by taking the world and dividing it into networked units where otherwise worthless information becomes valuable enough to a small group that wants to consume it that i will publish it (i wouldn't announce into the ether that i like "faction skis" but i will announce it to a known audience of friends)... the act of filling in the mad-lib itself (be it who i am friends with or where i am) is what generates the utility for users, and the value for the company -- but this is basically impossible to do without crazy scale.

2. despite recent media bru-haha -- by the inherent nature of nwp vs. swl, nwp companies really should face more privacy issues than swl companies

--- structurally nwp is about taking existing data-sets and re purposing them for other ends that were not initially intended, which means that people have already "contributed" to their and can't easily toggle use... sl, on the other hand, has the benefit that users have to contribute explicitly to the system for the system to know anything about them. that leaves all the power on the edge, at the time of publishing the information - which is a far higher control position to be in.

3. i believe that the big issue for google is that they haven't demonstrated that they understand structured language as well as they do nwp, and they probably need to figure it out

--- (see, for instance, the fiasco of the buzz launch, which failed because they defaulted the friend lists on, a clear sl fail, because users should get explicit utility out of adding friends)... they have the scale, can they transition it into generative sl information piping? but on the flip side, of course, facebook has definitely blundered from a pr perspective when they have tried to extend out sl with nwl (see beacon)...

4. game dynamics

--- other smaller players have recently have tried to short circuit the sl game with "game dynamics" like foursquare badges (which are an attempt to get me to use a structured language that otherwise i wouldn't care about / before true social dynamics kick in).... i do believe that can do more than offer a kick start, i don't think it is sustainable at scale, and i do think it pollutes data sets.

5. which leads to the point that hits in the crowdflower/turk sense

--- is a tiny tiny portion of the overal human input economy. facebook is buying hundreds of millions of people's cycles on the cheap right now/ many orders of magnitude more than can be bought for cash through turk.

--- 6+. re the last letter... i got a lot of awesome feedback

which will make its way into future letters. highlights include ross barna pointing out that while the social capital shift is all well and good, the economic impact is minuscule compared to some of the macro trends in china, conversation with spenser lazar on the fact that "publicity" in the sense of distribution of message is more expensive then ever, and brad hargraeves pointing out that the real point is that "everything is denominated in everything else" along with a bunch of other good points.


original swl blogposts and letters 2007-2010