I shipped the Web of Trust ranking algorithm. You can view it at https://stacker.news/wot. You'll either have to click that link or manually type the
/wot into the url bar. Eventually this ranking algo will go live on the main feed but for now, I'll just be evaluating its performance to make sure it's strictly increasing signal.
For the main feed, we need more defense against Sybils than sats at reasonable amounts can provide. As described in [RFC] Ranking SN posts, we either need to quantify trust or develop a model that's trustless. In the end we felt like a trust based model was better for a community than a trustless one - although, in the future some subs (job boards, product hunts, etc.) might be given trustless ranking algorithms.
Each user is a node in a graph and directed edges in the graph represent how much a user, Alice, trusts another user, Bob, on a scale of
0-0.9. For now, we determine this edge value based on how many upvotes Alice has given Bob. We then use the model described in Modelling a Public-Key Infrastructure (1996) to assign each user a single trust score of
0-0.9 where the source node is the moderator fo the sub (i.e. me on the main feed). We compute these trust scores daily.
That trust score is then used to weight a user's upvote in the ranking algorithm. For instance if Alice has a trust of
0.9 and Bob has a trust value of
.45, Bob's upvotes count for half as much in ranking. Carol who is a Sybil created by Bob and hasn't received any upvotes, has a trust of
0 and consequently their upvotes don't count at all.
As the site grows, eventually logged in users will have their own WoT and not rely solely on the mod's. The consequence of this is that a user's feed won't be a commons subject to its tragedies - it will be "private property" which we know scales much better.
We might also make the model more elaborate with time, taking into account more than just upvotes between users.
It occurred to me while implementing the above model, that this is basically PageRank, Google's search algorithm, but for trust. Instead of the random surfer choosing a webpage based on backlink quality and quantity, our random surfer is a user choosing a post based on upvote quality and quantity! Unfortunately, as I set out to implement this, I found this patented paper using this exact markov model approach.
Again, it's sadly patented. I'm not sure it'd hold up in court as it is just an application of Page Rank - whose patent has long expired - but I don't want to poke a patent troll. Our model works well enough anyway and their patent will expire in a couple years; we can probably license it too in the meantime if we want to (afaik no one is using it).
Once we build the web of trust, we basically do a breadth first search from the mod user (me), building up all disjoint, acyclic paths (and their accumulated trust) from me to every other user. We then take these trust values and "compress" them into a single trust value.
I believe the resulting algorithm is
N is the number of users in the graph with inbound edges. We do not consider cycles or overlapping paths. Intuitively, overlapping trust relationships have a marginal impact on a user's trust relative to independent ones. Additionally, we limit the depth of bfs to 6 because that should be far enough to capture every user. We also don't compute the single trust value using the inclusion-exclusion principle which is
k is the number of terms or inbound paths to a particular node. Instead we approximate it.
It's all subject to change but that's the bulk of it.
I'll continue to monitor the algorithm and it'll go live if it seems to perform well. I'll simultaneously begin working on some of the smaller items from the gh issue backlog and probably take a swing at search starting next week.