Thursday, December 15, 2011

Opposing SOPA and PIPA


I'm in a bit of a rush at the moment, but wanted to add my blog to the chorus of opposition to SOPA and PIPA heading through Congress. As the House Judiciary Committee is voting on SOPA today, now is the time for everyone out there to call their reps and oppose these bills.

Sunday, December 4, 2011

Klout and the Future of Whuffie

I like Klout, admittedly with a few reservations, realizing that it has a long way to go. It is something of an attempt to be a Whuffie system, and I have been writing about those for years. Klout, despite its flaws, gives a good example of how Whuffie systems might start out and where they might go in the future. Klout's stated goal is to measure one's online influence, and render that information as a number between 1 and 100. To quote their About page: "The Klout Score measures influence based on your ability to drive action. Every time you create content or engage you influence others." This year, Klout has hit public attention in the online world. There are a lot of people who think it is desperately important, and others that hate it, but Klout has at least succeeded in getting people's attention.

I do have several criticisms of Klout. I am curious as to how meaningful it is as a measurement in terms of real-world impact. What exactly does online influence mean, and should people care about it as much as some of them seem to? How exactly do you define influence at all? As far as I know, it is not easy to make tests where we ask people to attempt to influence the world in a particular way, and measure the result. Klout is only measuring comments, retweets, shares and such. It does appear to be correlated with real-world fame and influence, as can be seen by people that have real-world fame tend to have high Klout scores. People are still trying to find out whether Klout is actually usable in real world situations. There haven't been anything approaching double-blind scientific tests, only approximations and Klout's marketing department trying to convince us that it is a real thing.

Klout's algorithm is secret, so for all we know a significant portion of the score could be a random number generator behind their black box. I might just be an openness and transparency fanatic, but I imagine that will be a significant weaknesses for Klout. There is no accountability or insight. Considering that the metric is their business, this is somewhat understandable, but for me at least, it is a hindrance. Klout is succeeding in getting attention right now, because it's the only real competitor in this space and no one is providing an alternative. We simply have to take their metric because it is the only one out there.

While one of Klout's benefits is simplicity of presentation, I wish that one could go into more detail with the score. Social influence, online or not, is incredibly complex. People can be hugely influential on one subject and not another; people might be more likely to influence those with similar tastes; there is a difference between influencing the world by creating new works and directing people to existing ones; and there are many other shades of 'influence' that Klout doesn't go into. Klout does attempt to make its metric more focused by listing individual topics, but I find that system to be largely irrelevant at the moment. Topics are limited to three per person and they are determined solely by what I believe is a natural language processing algorithm on their end. Mine are coffee and libraries and that is almost completely nonsensical. Up until a week ago, Klout thought I was an authority on Skynet. It will get better when people can suggest topics for others, and Klout has said that improvement will be here soon. It would also improve if there could be a greater number of topics, and if each had its own individual score. Klout also attempts to break the metric down into sub-factors of reach, amplification and a network's influence, but again, those are somewhat hand-wavy. There is a long way to go on all these fronts. We do not just need more accurate metrics, but to better define what we want to measure.

If Klout were a predictive metric instead of simply a number, it would at least provide an explicit connection to the world. Where if your score is x, you can, through y effort, change the world z amount. Then the metric can be tested and refined. Unfortunately, then you need metrics for x, y and z, and to measure and define a host of other items affecting the system. Despite the outpouring of data about people's social lives, the social sciences remain far from the hard sciences. Klout is just a first step. Perhaps if there were a way to spend Klout, that would give us an exchange rate to something tangible, but I'm not sure that even makes sense, despite the idea of 'reputation markets'. Reputation and influence are not easily transferable or fungible, so our handy metaphors are not much use. Could you measure a Gross National Attention and see where it is being spent?

Our shortage of metrics in the social sciences is so acute that people are starving for them, even if they are as imprecise and inaccurate as Klout. Klout is already showing itself to be significant, simply because it is used. I have seen a lot of articles this year by people ranting how important Klout is. That 'Without Klout, Google+ is dead to me'. Instances of people throwing fits with regards to their score. Given the tone used in those articles, it is easy to dismiss the people obsessing about Klout as twats, but Klout does appear to be a meaningful measure to some extent. How to make it more useful is a better question.

People, mainly marketers, are attempting to use the metric and measure the results. Dozens of companies are giving away perks to people with high Klout scores and certain topics of influence in the hopes that they will talk up their products and influence people to buy them. Audi appears to still be trying to figure out whether or not their participation in such a plan actually had results. There's a new idea to give people with high Klout equity in startups if they and promote them. Again, a ways to go, but with money being poured in, people are going to want to be able to measure their return on investment.

As hungry as people are to use the Klout score for marketing, people are even more hungry to have a high Klout score, out of sheer simple human competitiveness. Klout encourages people to sign up additional services in order to get a higher score. Not everyone cares to do this of course, but certain competitive people like me are more than happy to. Thankfully, I've avoided sounding like a twat on a message board, but I can see some of where they are coming from. I could easily imagine people out there wanting to give Klout access to their emails and phone records just so that their connections to various people could be added to their score. As I wrote earlier, it gives people something to compete for in a way similar to money because it's a status metric.

Klout does indeed lead to some negative effects in the real-world, beyond shrill comments on blogs. A lot of people have criticized that it just creates an arbitrary pecking order for people to be pricks about, or to easily dismiss people without looking at them, and that is a danger. A great criticism of the social networks behind Klout, 'The Social Graph is Neither', does a good job of pointing out how far removed Facebook and such are from normal human interaction. Klout does certainly have a 'teach to the test' effect, encouraging online interactions for the sake of scoring points. I admit I have found myself influenced to tweet more simply with scores in mind. My response to those criticisms is to make the test better, instead of abolishing the test. That is why I am trying to better connect the metric with reality. Ideally, I want Klout, or a new Klout replacement, to be greatly improved, but I am thankful for it as a first step.

This has gotten quite long enough at this point, so I'm going to break up what I am writing into more articles. Next up will either be a deeper look at ways to improve social metrics, or a look at my favorite social network, Reddit, thoughts on improving it, and how it ties in with Klout.

Thursday, December 1, 2011

Reddit

Given how often I think about creating web services that would require collecting and analyzing massive amounts data in order to provide the features I want, Reddit does a good job of playing devil's advocate, because it provides much of what I want while being incredibly simple. Reddit is the best source of link recommendations I have found, and it does almost nothing to provide what I dream of in creating the Platonic ideal of the social net. There are no personalized recommendations. It strictly avoids caring about user's real names, locations or other demographic data. Almost nothing is done to use or display social connections. None of the fancy stuff I dream of seeing. It is simple, and it works.

Having a functioning online community and also attempting to wring every last piece of data out of users are at least slightly contradictory goals. As much as I hate to admit it, I share some goals with Facebook. We both want to be able to analyze all the data coming out of social activity, though Facebook is in it for the advertising dollars, and I want it out of curiosity and potential academic reasons. As 'The Social Graph is Neither' points out, reducing social networks to programmatic code is awkward for a number of reasons: real life social connections are not easily categorized, many times explicitly declaring them is detrimental to the real world social connection, and the effort to record all the data can create an unnerving worry of privacy and voyeurism. You do not need to track and analyze every detail of people’s lives to create a useful social network.

Reddit explicitly avoids most of the fancy data processing I want to play with, has a very simple model, and it seems to work tremendously well. It might even succeed because of it. It manages to give me some of the best recommendations for what to read on the internet. One of its main competitors, Digg, implemented several of the features I want, and Reddit beat it into the ground. Yet I still wish to expand upon Reddit's model, to analyze and toy with the data. There are many things other than simplicity that differentiated Reddit from its competitors, and I hope there is a way to expand upon Reddit's functionality without bringing it crashing to the ground.

What I Would Like To Improve: 

Transparency

The general anonymity of Reddit can lead to some rather unsavory behavior, as seen in a row that popped up across the net last month. Yet a lot of people seem to like that anonymity, and it did help make Reddit what it is today. With usernames not tied to real names, it does make it easier for people to speak their minds. By minimizing the social network features, it does make it harder for mob-mentalities to mass promote links. Unlike YouTube, the comments are (generally) not a trolling wasteland, giving credence to the idea that using real names does not do not do much to improve online interactions. A lot of people do feel Reddit is a community and spend the majority of their time on the site writing comments, as opposed to submitting links.

As for myself though, I am not a fan of the encouraged anonymity part of Reddit. Because of it, I only rarely get into discussions. Talking with a username with no human face does not give me any context and makes it hard to care about the discussion. When I get into an argument about politics I don't know if I am I talking to a voter who actually believes what they are stating, some troll just being an ass, or a 15 year-old who just read 'Atlas Shrugged'. It is hard for me to feel that Reddit is a community instead of just a machine that spits out link recommendations.

I would loosen the encouraged anonymity and allow people to opt in to displaying their own personal information. Allow users to remain anonymous if they choose, but also allow accounts to be directly linked to Google+, Twitter, Facebook, etc., and encourage code to shuttle information back and forth between them. I want a Reddit where I could know more about the person or bot behind the accounts that were posting so as to have some more sense of community. Make it easier to find people I already know on the site, and make it easier to gather all the statistical data about where posts were coming from, the demographic breakdown of who was making them, where in the world they were coming from etc. I believe this would greatly encourage me to actually interact with and talk to people on Reddit.

Note that other sites have attempted this, namely Digg, and that lead to some negative effects, the largest one being that groups of friends would vote in sync. Perhaps we could move to the system used in most modern republics, where the secret ballot was invented to discourage social pressure and the buying of votes. Currently on Reddit you are able to see every vote a user has ever cast. I would consider getting rid of that, but that would make it hard to do some of the changes I am about to get to.

Post Automation/Aggregation

I would loosen the spamming rule just a bit, and allow or encourage people and organizations to post their own stories. Make submission to Reddit easier to automate, so that posts I make to Twitter or Google+ would be automatically sent to Reddit. Do not discourage news websites from automatically sending all their stories to various subreddits. Essentially, I want a more universal view of what is popular on the internet right now. I want a more omniscient view of the net and encouraging more content to be dumped into Reddit is a good way of doing that. Of course, then you would need a better way to filter everything.

Apply Better Filters

I would not directly ban vote manipulation in the same manner Reddit currently does, but would instead try to get the software to weigh votes. This way one would not have a binary response to vote manipulation, but a more continuous approach. The system I imagine would by default give less weight to votes coming from bots, or accounts all voting in sync, or votes that have been paid for, but would acknowledge that there is potentially valuable opinion in these votes. Perhaps, yes, this group of 30 accounts voting in sync might all be sock-puppets run by one person, but instead of ban-hammering them, count all their votes together as the equivalent of one vote. I want to have a system where you don't need explicit, iron rules against spam and vote rigging, but a system that takes them into account and adjusts according. You can have thousands of bots voting for every single person that votes, but if their votes mean nothing to you, they do not need to be deleted, they can just be ignored.

To continue in that direction, do not just make a distinction in weighting between people who are 'cheating' and those who are not, but between humans that have contributions of different value. Have a system that can measure the statistical similarity between how people vote and the links they submit, so that saying one wants to weigh someone's vote higher causes more of what they like to be recommended to you.

Furthermore, allow users to customize these weights themselves, and allow them to alter the weights at the flip of a switch. I want to be able to weigh votes and shift that weighting so that I could read different types of opinions. Given default laziness in most users, you would want to create default weightings that would have a similar effect to the current policies of reducing spam and vote manipulation, but make it easy to modify. Allow me tailor my own basic weightings, so that I can specify the how much I like various posters, how much I like various sources and so on. You could even have people's votes count inversely. As a liberal, I might want to tag a conservative as someone whose vote counts negative to me.

Ideally I could easily escape my own echo chamber by flipping a dial to see what conservative people were reading. I would like to flip between what people my age were reading and other generations, what fellow Americans were reading vs. other countries, what my friends were reading vs. strangers and many other variables. Even better, allow me to say that this other user is to the right, this other user is to the left, and have the system calculate, based on everything they have voted on, everything that people with voting patterns similar to them have voted on, and so forth, and then sort every post along that axis. That is the sort of power I want when using Reddit.

Another similar change I might like to make would be to move away from the independent subreddit model. I think it should be easier to post links in multiple subreddits at once so as to be able to see how the same post fairs in different subreddits. Essentially, I do not want the discrete distinction between various subreddits to be the final arbiter in how something is seen. I want to be able to use a subreddit as just another type of filter or tag, and for Reddit to be able to recommend links to me in subreddits I have never seen.

Why More Isn't Done With Reddit's API

When I started writing this article more than a year ago, I was under the mistaken impression that Reddit did not have a public API, and that this was what was preventing more sites from interfacing with it. It turns out Reddit has a quite thorough API. I was stunned by this because with an open API like that, I would expect Reddit to be overrun by bots. And indeed, it does have something of a problem with that, but not as large as I would expect. There are websites that I won't link to, that will sell you upvotes. I'm surprised that every news site doesn't have code that automatically sends whatever they publish to Reddit, despite the ban on self-promotion. True, the rules largely prohibit that, but I would think it would be beneficial enough to get these results that people would be hiding their spam behind various randomizing filters. There are certainly bots and spammers on Reddit, but they have not overwhelmed the site. On the whole it seems Reddit is mainly run by humans. Perhaps it is just the hard work of the moderators of each of the various subreddits that keep things from being overrun by bots.

I'm surprised more isn't being done to analyze all the data there is. Instead the main uses of the API are browser extensions like the Reddit Enhancement Suite and smart phones apps. There are bits of JavaScript to allow blog articles to be submitted and upvoted from external websites, but that is a far smaller degree of integration than I would imagine. Nothing that exciting, though useful and nice. Again, perhaps the lesson is to stay within the bounds of practicality. Maybe the main thing protecting the humanity of Reddit is not the moderators, but the fact that Reddit is not as popular as I would imagine it to be, and that the cost of the code I am imagining is simply not worth the price.

Potential Consequences and What Might Be Done To Compensate 

For the purposes of this thought experiment, I am imagining a future internet where Reddit is big and popular enough that mere human moderators cannot sweep back the bots, and the only solution is to invent smarter filters. The changes I am imagining being made to improve Reddit would still make several problems more likely. The anonymity and lack of social connections in Reddit discourages mob-mentality that was one of the issues that broke Digg. Weighing votes based on their popularity could provide positive-feedback loops leading to a situation where a small number of users dominate what is highly ranked on the site.

We would need to be very careful with how votes were weighed. Ideally you would have something where the value of someone's opinion was based on a combination of how other people valued it, how close they were to your social circle, and how similar that opinion was to yours.

The default weighting systems would provide a tremendous amount of influence on how the site felt to average users. I would hope that allowing users to customize their weighing systems would create enough diversity that it would avoid creating single broken weighting to set the entire site off kilter. I also would not have only a single default weighting for new users, but would assign different values, both to create more diversity, and for the purpose of A/B testing.

Allowing people to tune their voting systems will likely encourage echo chambers, but at least if you give everyone their own weighting system, it allows the echo chambers to be personalized. And I would want to provide easy to use options to make it easy for users to choose to see posts from opposing viewpoints that were popular enough. No guarantee that people would still want to do that, but making it easy would at least encourage that behavior.

I want a universal, continuous, automated system, and I know that would require a tremendous amount of data and computation to execute in a usable manner. A significant part of what I want involves statistics that I am just being hand-wavy about, because I have not thoroughly studied them. The UX design alone would be a tremendous headache. I admit that I am not sure how practical these ideas might be, but it is what I would aim for. However, Reddit has approximately 20 employees, and their existing, simple model does not overreach. Perhaps the lesson there is to live within the bounds of practicality.

Ordering the Chaos 

Reddit makes a virtue of its simplicity, and here I want to apply more fancy software to something that is already working. But that way lies the future! It might not be easy to alter Reddit to improve it, most of the possible changes that might be made would make it worse, but there are still many improvements that might be made. I want to see the entire net translated into some Reddit/Twitter/Google+ hybrid, and easily see what is popular right now among either my social circle, humanity at large, or some specific subsection. I want to be able to easily see what people are saying about various subjects, and divide what they're saying by whatever demographic I chose. I want to neat statistical maps of how memes are flowing. I want an idealized universal perfect internet.

I suppose my dream for a more universal, more analyzed, tune-able Reddit is close to how Google+ is supposed to work. I look at Google+ and see how it technically has most everything Reddit has, save for downvoting. I use Google+ in mostly the same way I use Reddit, in that I +1 pages on Google+ the same way I upvote pages on Reddit, and I share a page on Google+ like I submit a page to Reddit. From my perspective the only differences between the two are that Google+ unsurprisingly has a better Chrome extension to make it easier to submit pages to Google+, and that Reddit has significantly more users and user activity. Not surprisingly, I spend more time on Reddit. Google+ is also different from Reddit in that it doesn't have Reddit's user culture, but I don't really care about that, aside from the link recommendations that come out of it. I just see two different websites that I have to bounce back and forth between, when if they were combined the whole might be greater than the sum of their parts.

The larger picture is that I am not just looking for how to make one website better. I don't want to just have to go to one site. I want the entire internet to be behaving by the set of rules that I've got in my head, with the good features of Reddit, Google+, etc., shared universally. I don't really care about the local culture of some section of the net. I just want a Platonic ideal of the net, and a pretty data visualization layer over the top. If we could get everyone using open standards like OpenSocial and such, and allow people to easily personalize how they see all the data out there... Well, I might as well wish for a third party candidate to have a chance at winning the presidency.

In the meantime, for the purposes of having links recommended to me, Reddit is the best site on the net. This may be mainly because of its users, and those users may be due to Reddit’s simplicity. Still, Reddit is a long ways off from where I dream of being. But Reddit is a step closer, and a lot more useful to me, than most other things out there.

Notes:

http://blog.pinboard.in/2011/11/the_social_graph_is_neither/
https://gawker.com/5950981/unmasking-reddits-violentacrez-the-biggest-troll-on-the-web
http://www.reddit.com/rules
https://news.ycombinator.com/item?id=231168
https://github.com/reddit/reddit/wiki/API
https://en.wikipedia.org/wiki/OpenSocial
http://www.quora.com/What-are-the-specifics-of-creating-a-content-discovery-algorithm-as-in-Digg-Reddit-or-Hacker-News
http://www.quora.com/How-do-the-Digg-and-Reddit-algorithms-work-and-how-do-they-compare-in-their-effectiveness-to-fish-out-the-best-content
http://techcrunch.com/2012/07/29/surprisingly-good-evidence-that-real-name-policies-fail-to-improve-comments/