Thursday, December 1, 2011

Reddit

Given how often I think about creating web services that would require collecting and analyzing massive amounts data in order to provide the features I want, Reddit does a good job of playing devil's advocate, because it provides much of what I want while being incredibly simple. Reddit is the best source of link recommendations I have found, and it does almost nothing to provide what I dream of in creating the Platonic ideal of the social net. There are no personalized recommendations. It strictly avoids caring about user's real names, locations or other demographic data. Almost nothing is done to use or display social connections. None of the fancy stuff I dream of seeing. It is simple, and it works.

Having a functioning online community and also attempting to wring every last piece of data out of users are at least slightly contradictory goals. As much as I hate to admit it, I share some goals with Facebook. We both want to be able to analyze all the data coming out of social activity, though Facebook is in it for the advertising dollars, and I want it out of curiosity and potential academic reasons. As 'The Social Graph is Neither' points out, reducing social networks to programmatic code is awkward for a number of reasons: real life social connections are not easily categorized, many times explicitly declaring them is detrimental to the real world social connection, and the effort to record all the data can create an unnerving worry of privacy and voyeurism. You do not need to track and analyze every detail of people’s lives to create a useful social network.

Reddit explicitly avoids most of the fancy data processing I want to play with, has a very simple model, and it seems to work tremendously well. It might even succeed because of it. It manages to give me some of the best recommendations for what to read on the internet. One of its main competitors, Digg, implemented several of the features I want, and Reddit beat it into the ground. Yet I still wish to expand upon Reddit's model, to analyze and toy with the data. There are many things other than simplicity that differentiated Reddit from its competitors, and I hope there is a way to expand upon Reddit's functionality without bringing it crashing to the ground.

What I Would Like To Improve: 

Transparency

The general anonymity of Reddit can lead to some rather unsavory behavior, as seen in a row that popped up across the net last month. Yet a lot of people seem to like that anonymity, and it did help make Reddit what it is today. With usernames not tied to real names, it does make it easier for people to speak their minds. By minimizing the social network features, it does make it harder for mob-mentalities to mass promote links. Unlike YouTube, the comments are (generally) not a trolling wasteland, giving credence to the idea that using real names does not do not do much to improve online interactions. A lot of people do feel Reddit is a community and spend the majority of their time on the site writing comments, as opposed to submitting links.

As for myself though, I am not a fan of the encouraged anonymity part of Reddit. Because of it, I only rarely get into discussions. Talking with a username with no human face does not give me any context and makes it hard to care about the discussion. When I get into an argument about politics I don't know if I am I talking to a voter who actually believes what they are stating, some troll just being an ass, or a 15 year-old who just read 'Atlas Shrugged'. It is hard for me to feel that Reddit is a community instead of just a machine that spits out link recommendations.

I would loosen the encouraged anonymity and allow people to opt in to displaying their own personal information. Allow users to remain anonymous if they choose, but also allow accounts to be directly linked to Google+, Twitter, Facebook, etc., and encourage code to shuttle information back and forth between them. I want a Reddit where I could know more about the person or bot behind the accounts that were posting so as to have some more sense of community. Make it easier to find people I already know on the site, and make it easier to gather all the statistical data about where posts were coming from, the demographic breakdown of who was making them, where in the world they were coming from etc. I believe this would greatly encourage me to actually interact with and talk to people on Reddit.

Note that other sites have attempted this, namely Digg, and that lead to some negative effects, the largest one being that groups of friends would vote in sync. Perhaps we could move to the system used in most modern republics, where the secret ballot was invented to discourage social pressure and the buying of votes. Currently on Reddit you are able to see every vote a user has ever cast. I would consider getting rid of that, but that would make it hard to do some of the changes I am about to get to.

Post Automation/Aggregation

I would loosen the spamming rule just a bit, and allow or encourage people and organizations to post their own stories. Make submission to Reddit easier to automate, so that posts I make to Twitter or Google+ would be automatically sent to Reddit. Do not discourage news websites from automatically sending all their stories to various subreddits. Essentially, I want a more universal view of what is popular on the internet right now. I want a more omniscient view of the net and encouraging more content to be dumped into Reddit is a good way of doing that. Of course, then you would need a better way to filter everything.

Apply Better Filters

I would not directly ban vote manipulation in the same manner Reddit currently does, but would instead try to get the software to weigh votes. This way one would not have a binary response to vote manipulation, but a more continuous approach. The system I imagine would by default give less weight to votes coming from bots, or accounts all voting in sync, or votes that have been paid for, but would acknowledge that there is potentially valuable opinion in these votes. Perhaps, yes, this group of 30 accounts voting in sync might all be sock-puppets run by one person, but instead of ban-hammering them, count all their votes together as the equivalent of one vote. I want to have a system where you don't need explicit, iron rules against spam and vote rigging, but a system that takes them into account and adjusts according. You can have thousands of bots voting for every single person that votes, but if their votes mean nothing to you, they do not need to be deleted, they can just be ignored.

To continue in that direction, do not just make a distinction in weighting between people who are 'cheating' and those who are not, but between humans that have contributions of different value. Have a system that can measure the statistical similarity between how people vote and the links they submit, so that saying one wants to weigh someone's vote higher causes more of what they like to be recommended to you.

Furthermore, allow users to customize these weights themselves, and allow them to alter the weights at the flip of a switch. I want to be able to weigh votes and shift that weighting so that I could read different types of opinions. Given default laziness in most users, you would want to create default weightings that would have a similar effect to the current policies of reducing spam and vote manipulation, but make it easy to modify. Allow me tailor my own basic weightings, so that I can specify the how much I like various posters, how much I like various sources and so on. You could even have people's votes count inversely. As a liberal, I might want to tag a conservative as someone whose vote counts negative to me.

Ideally I could easily escape my own echo chamber by flipping a dial to see what conservative people were reading. I would like to flip between what people my age were reading and other generations, what fellow Americans were reading vs. other countries, what my friends were reading vs. strangers and many other variables. Even better, allow me to say that this other user is to the right, this other user is to the left, and have the system calculate, based on everything they have voted on, everything that people with voting patterns similar to them have voted on, and so forth, and then sort every post along that axis. That is the sort of power I want when using Reddit.

Another similar change I might like to make would be to move away from the independent subreddit model. I think it should be easier to post links in multiple subreddits at once so as to be able to see how the same post fairs in different subreddits. Essentially, I do not want the discrete distinction between various subreddits to be the final arbiter in how something is seen. I want to be able to use a subreddit as just another type of filter or tag, and for Reddit to be able to recommend links to me in subreddits I have never seen.

Why More Isn't Done With Reddit's API

When I started writing this article more than a year ago, I was under the mistaken impression that Reddit did not have a public API, and that this was what was preventing more sites from interfacing with it. It turns out Reddit has a quite thorough API. I was stunned by this because with an open API like that, I would expect Reddit to be overrun by bots. And indeed, it does have something of a problem with that, but not as large as I would expect. There are websites that I won't link to, that will sell you upvotes. I'm surprised that every news site doesn't have code that automatically sends whatever they publish to Reddit, despite the ban on self-promotion. True, the rules largely prohibit that, but I would think it would be beneficial enough to get these results that people would be hiding their spam behind various randomizing filters. There are certainly bots and spammers on Reddit, but they have not overwhelmed the site. On the whole it seems Reddit is mainly run by humans. Perhaps it is just the hard work of the moderators of each of the various subreddits that keep things from being overrun by bots.

I'm surprised more isn't being done to analyze all the data there is. Instead the main uses of the API are browser extensions like the Reddit Enhancement Suite and smart phones apps. There are bits of JavaScript to allow blog articles to be submitted and upvoted from external websites, but that is a far smaller degree of integration than I would imagine. Nothing that exciting, though useful and nice. Again, perhaps the lesson is to stay within the bounds of practicality. Maybe the main thing protecting the humanity of Reddit is not the moderators, but the fact that Reddit is not as popular as I would imagine it to be, and that the cost of the code I am imagining is simply not worth the price.

Potential Consequences and What Might Be Done To Compensate 

For the purposes of this thought experiment, I am imagining a future internet where Reddit is big and popular enough that mere human moderators cannot sweep back the bots, and the only solution is to invent smarter filters. The changes I am imagining being made to improve Reddit would still make several problems more likely. The anonymity and lack of social connections in Reddit discourages mob-mentality that was one of the issues that broke Digg. Weighing votes based on their popularity could provide positive-feedback loops leading to a situation where a small number of users dominate what is highly ranked on the site.

We would need to be very careful with how votes were weighed. Ideally you would have something where the value of someone's opinion was based on a combination of how other people valued it, how close they were to your social circle, and how similar that opinion was to yours.

The default weighting systems would provide a tremendous amount of influence on how the site felt to average users. I would hope that allowing users to customize their weighing systems would create enough diversity that it would avoid creating single broken weighting to set the entire site off kilter. I also would not have only a single default weighting for new users, but would assign different values, both to create more diversity, and for the purpose of A/B testing.

Allowing people to tune their voting systems will likely encourage echo chambers, but at least if you give everyone their own weighting system, it allows the echo chambers to be personalized. And I would want to provide easy to use options to make it easy for users to choose to see posts from opposing viewpoints that were popular enough. No guarantee that people would still want to do that, but making it easy would at least encourage that behavior.

I want a universal, continuous, automated system, and I know that would require a tremendous amount of data and computation to execute in a usable manner. A significant part of what I want involves statistics that I am just being hand-wavy about, because I have not thoroughly studied them. The UX design alone would be a tremendous headache. I admit that I am not sure how practical these ideas might be, but it is what I would aim for. However, Reddit has approximately 20 employees, and their existing, simple model does not overreach. Perhaps the lesson there is to live within the bounds of practicality.

Ordering the Chaos 

Reddit makes a virtue of its simplicity, and here I want to apply more fancy software to something that is already working. But that way lies the future! It might not be easy to alter Reddit to improve it, most of the possible changes that might be made would make it worse, but there are still many improvements that might be made. I want to see the entire net translated into some Reddit/Twitter/Google+ hybrid, and easily see what is popular right now among either my social circle, humanity at large, or some specific subsection. I want to be able to easily see what people are saying about various subjects, and divide what they're saying by whatever demographic I chose. I want to neat statistical maps of how memes are flowing. I want an idealized universal perfect internet.

I suppose my dream for a more universal, more analyzed, tune-able Reddit is close to how Google+ is supposed to work. I look at Google+ and see how it technically has most everything Reddit has, save for downvoting. I use Google+ in mostly the same way I use Reddit, in that I +1 pages on Google+ the same way I upvote pages on Reddit, and I share a page on Google+ like I submit a page to Reddit. From my perspective the only differences between the two are that Google+ unsurprisingly has a better Chrome extension to make it easier to submit pages to Google+, and that Reddit has significantly more users and user activity. Not surprisingly, I spend more time on Reddit. Google+ is also different from Reddit in that it doesn't have Reddit's user culture, but I don't really care about that, aside from the link recommendations that come out of it. I just see two different websites that I have to bounce back and forth between, when if they were combined the whole might be greater than the sum of their parts.

The larger picture is that I am not just looking for how to make one website better. I don't want to just have to go to one site. I want the entire internet to be behaving by the set of rules that I've got in my head, with the good features of Reddit, Google+, etc., shared universally. I don't really care about the local culture of some section of the net. I just want a Platonic ideal of the net, and a pretty data visualization layer over the top. If we could get everyone using open standards like OpenSocial and such, and allow people to easily personalize how they see all the data out there... Well, I might as well wish for a third party candidate to have a chance at winning the presidency.

In the meantime, for the purposes of having links recommended to me, Reddit is the best site on the net. This may be mainly because of its users, and those users may be due to Reddit’s simplicity. Still, Reddit is a long ways off from where I dream of being. But Reddit is a step closer, and a lot more useful to me, than most other things out there.

Notes:

http://blog.pinboard.in/2011/11/the_social_graph_is_neither/
https://gawker.com/5950981/unmasking-reddits-violentacrez-the-biggest-troll-on-the-web
http://www.reddit.com/rules
https://news.ycombinator.com/item?id=231168
https://github.com/reddit/reddit/wiki/API
https://en.wikipedia.org/wiki/OpenSocial
http://www.quora.com/What-are-the-specifics-of-creating-a-content-discovery-algorithm-as-in-Digg-Reddit-or-Hacker-News
http://www.quora.com/How-do-the-Digg-and-Reddit-algorithms-work-and-how-do-they-compare-in-their-effectiveness-to-fish-out-the-best-content
http://techcrunch.com/2012/07/29/surprisingly-good-evidence-that-real-name-policies-fail-to-improve-comments/

No comments:

Post a Comment