Monday, February 14, 2011

Quantitative Measures of Social Data

I've talked before about whuffie, going all the way back to one my first posts, and would like to see if I could expand on the subject a bit. Humanity is continuing to gather a new mass of data made available by the net and computers but we are still looking for better tools to better analyze it.

We've got many new projects that are beginning to utilize the new mass of data available through websites, blogs, Twitter, social networks, cell phone, GPS systems, and such. MIT's Billion Prices Project, Google's Ngram Viewer, etc. All that is still only in the most rudimentary stages. New ones are popping like Swipely that I'm quite excited about. If we can start seeing how money flows through the economy more accurately that will have a huge effect on economics. Or if Open Social or Diaspora ever take off, or Facebook opens up, or if Google just gets a popular social network; we can start associating all this data with people's social connections.

Obviously more data is being created, and is being analyzed in some useful ways. That's not news. Google takes all the distributed hyperlinks on the web together and turns them into concrete search rankings. My issue with all this is that I want to be able to cross reference all these different domains, and start to find more uses for it. Part of that is that we're going to need more direct, easy to quantify, measurements. New scientific advances are created through new tools and measurements. Or whatever that quote was. I'll go read through some Kuhn and get back.

What sort of new measurements do we need?  I keep referring back to whuffie, and get all hand-wavy in actually determining what that is. In the book that coined the term whuffie, Cory Doctor's Down and Out in the Magic Kingdom, whuffie is supposed to a simple measure of one's reputation. This is the main currency in the book because they exist in a post-scarcity economy. I keep asking myself, is that even a goal that's imaginable? I like the intention of measuring someone's reputation, or contribution to society, as a single simple number for many reasons, but am not sure how possible it is. We like money because it measures an incredible mess of things by one simple number. I feel that in many ways money is so important to people because as a simple number, it's an easy way to keep score in the game of life. I'd like to provide an alternative simple number to measure some of the things money doesn't. Or, to provide other ways to measure the externalities missed by money.

We might be seeing the beginning of that with things like Reddit's karma. The Whuffie Bank is trying, but is doing an absolutely poor job of it. Part of the issue is that reputation and social worth are so poorly defined, we're approaching the issue from the wrong direction by starting with the concept of whuffie and working back from it. I'm not sure an absolute objective measure of reputation or social worth is possible, as it seems that any such thing is relative to the perspective of each person. Even accepting the value as relative, we still don't have any idea about how to measure that value. The best way to get where I'd like to be would be to start with simpler, more easily definable metrics that we can actually use.

So, with searching for more precise quantifiable metrics, what questions to we want to ask? How similar is another person's tastes to mine? Or specific tastes with regards to a given domain? This one is being attempted through various recommendation engines, and though they still have a long way to go they are producing usable results. Similarly we could see how similar one's personal spending habits are to others, or geographical placement, or social network. I suppose all those dating sites are matching stated tastes in dating/sexual partners.

We've got all these taste measurements progressing, and are somewhat successfully using them to provide recommendations to people. All these relatively simple measures of the choices that people are making. Last.fm, Reddit, our search histories, Ok Cupid, our social networks through Facebook and such, purchase histories with Amazon and such, etc. At the moment most of those are behind private walls and not easy to combine for greater explanatory power, but I feel that doing has a bunch of potential. Especially if we manage to fold in things like cell-phone usage histories, browser histories, credit card histories. We could be making recommendations of music by search histories, or predicting political alignments by purchasing history, or any of a dozen other things. I imagine we might be able to do a Minnesota Multiphasic Personality Inventory type thing with them and use that info to judge personality types. All this together would give us much more detailed pictures of people's lives, although it would still be incredibly messy and hard to sort through. Need to read up on MIT's Reality Mining.

What next do we do with these? Sadly, my knowledge of statistics is not exactly what I would like it to be for answering these questions. Recommendations are predictions, in that they predict what people will like. If we can begin to predict what people will do in the future, that will be impressive. A good measure would be to see if some people are leaders in terms of changing tastes. With chagrin, I'm pulling up Malcolm Gladwell's The Tipping Point right now. That would probably be a combination of their social influence and the predictiveness of their taste. Hrm...

And in the last few paragraphs, I've gotten away from my original idea of being able to extract simple useful metrics out of this million dimensional space. There are a lot of questions I'd like to ask, but most are going to require their own ways of precisely defining what is meant by them. Ideally, I'd like simple measures of fame, reputation, popularity, desirability, social contribution, origination of ideas, political alignment, environmental impact and other things. Potentially within a specific domain or relative to a given person or group of people. Sigh. Of course I'd like a computer program that could answer whatever questions I ask of it, and give nice visualizations of the data, but that's about the practical as wishing for the moon. As I've been throwing this around for hours and have not gotten very far, I suppose I'll close off and see how I feel about it in the morning.

Oh! And following up on my last post they managed to remove Mubarak from power in Egypt. Protests are moving on to Algeria and Yemen. Interesting times to be alive, and I expect much of the Middle East to be pretty damn different by the end of the year. Hopefully in a good way.