Trust Metrics Mixin #1: PageRank #
We get on these little kicks. As we’re all snooping around. Little protocols or obscure version control systems. Or domino games or something.
Trust metric, man. It just strikes me all the time: I imagine everyone in the world with their hands cupped, a little ball of clay in the middle. Everyone gets a moment to divide up their ball into pieces for selected friends and relatives. Trick is they have to divide up the whole ball. They can’t hang on to any. Clay is redistributed. One guy has a big monolith! People, don’t give him any more! (Unless, of course, he really can be trusted with clay.)
Here’s today’s challenge. You are given objects which respond to:
trust(*objs)
: Assigns one unit of trust in an arc fromself
to each of theobjs
.trusts?(obj)
: Testsself
for any trust ofobj
.trust_spread
: The total number of trusted objects.trust_sphere
: All objects ranked in the system.
Given these objects, provide a mixin which determines a ranking. In today’s case, we’ll be mixing in a pagerank
method, which approximates rank based on a primitive form of Google’s algorithm.
module GoogleTrust DAMPEN = 0.85 def pagerank(ranks = {}) return ranks[self] if ranks.has_key? self ranks[self] = @rank || 0.0 @rank = (1 - DAMPEN) + (DAMPEN * trust_sphere.inject(0) do |n, (o, c)| n + (o.trusts?(self) ? o.pagerank(ranks) / o.trust_spread: 0) end) end end
This algorithm is an approximation of Google’s PageRank, as described in PageRank Explained Correctly with Examples by some third party. Regardless, the basic equation can be seen throughout the web and I will refer you to the paper above in sorting out exactly how PageRank actually works.
You probably get the jist, though: by linking to sites, you are extending a unit of trust to them. But also: the value of your link is also based on the links you are receiving. So, it’s sort of like a tent, ranks evolve from a gelatinous blob into a more rigid and beautifully useful form.
Let’s get some sites running our PageRank. I’ll use some blogs from the Ruby world:
class Site include GoogleTrust @@sites = [] attr_accessor :name def initialize(name) @name = name @arcs = {} @@sites.push self end def trust *sites sites.each { |site| @arcs[site] = (@arcs[site] || 0) + 1 } end def trusts? site; @arcs.has_key? site end def trust_worth site; @arcs[site] end def trust_spread; @arcs.length end def trust_sphere; @@sites end end red = Site.new('RedHanded') eig = Site.new('Eigenclass.org') rails = Site.new('Ruby on Rails') ana = Site.new('Anarchaia') prj = Site.new('Project.ioni.st')
So, sites are represented by a name and an arc of other objects, representing sites which have been linked to. Here are five blogs, let’s make a web.
red.trust rails, eig, ana eig.trust ana rails.trust prj, eig ana.trust prj, red, rails, eig prj.trust red, rails, eig, ana 40.times do |i| puts "--- PASS #{i} ---" red.trusted_sphere.each do |site| puts "#{ site.name }: #{ site.pagerank }" end end
The pagerank
is recursive, it computes the rank of a site based on the ranks of other sites, which are in turn based on its own rank. (Unless the links happen to manifest a heirarchy or a tree shape.) So, the ranking will change during the first many trials of running the metric. After you get to about the fifth pass, things settle down.
Anyway, note the rankings and see if you can work it out. After the first set, try adding some other sites into the mix:
pj = Site.new('PJ Hyett') tf = Site.new('Thomas Fuchs') matz = Site.new('Journal of Matz') red.trust matz, tf, pj eig.trust matz rails.trust tf ana.trust pj, matz matz.trust rails, eig tf.trust rails pj.trust rails, ana, prj 10.times do puts "---" red.trust_sphere.each do |site| puts "#{ site.name }: #{ site.pagerank }" end end
Other challenges applicable to this metric:
- I’ve ignored the
trust_worth
method thus far. What if a site is linked many times? Alter the mixin to allow the units to positively influence the metric. - All sites in the above equation start out equal. Try seeding the rankings. Maybe you want trust from Matz to be inherently more valuable, to add some kind of hierarchy to the mix.
- This trust metric computes a wholistic figure for ranking. In other words: How does this site rank in terms of the entire web? Try altering the fundamental question to: How does this site rank in terms of just Project.ioni.st’s view? This way, we can determine how sites not directly trusted by Project.ioni.st relate to it.
For extra points, if you are employed at Google, debunk my algorithm and leak us something sweet! (Complete code is in metric-pagerank.rb.)
Manfred
Right now you iterate a fixed number of times over the pagerank method of a site until you come to a arbitrary stability of the network. If you use the same constant for large networks, it’s not very likely that the values have stabilized.
You could add another method to site to run until a certain level of rigidness in the network is reached.
why
Yes, totally. Let’s consider that the fourth challenge, Manfred.
Also, I’m sure many of you will wonder, “So, what’s the maximum PageRank in this system? Is this ranked between 0 and 2?” Well, yes, I suppose. But the fundamental idea here is that if you add all the ranks together and average them out, you’ll get
1
. So you grade them along the rankings which emerge.Manfred
Elasticity implemented
Please don’t mind the lack of elegance, I have to sleep now (.
googler
While I can’t tell you anything, I can say that I read this blog and enjoy it!
mfp
I think the maximum PageRank given a network of N nodes is (1-d) + d(1-d)N which is attained when there is a single node pointed to by all the others (which have no incoming links). For example, for a network of 1000 nodes where 999 point to the parent one, the maximum PR would be 127.5225. Note that in this case the average PR is not 1, but 0.2773725, because the PR of the parent node is going to waste (since it has no outgoing links).
Also, the minimum PR is going to be (1-d), never 0.
mfp
I forgot to say that hiki ships with a PageRank plugin. Applied to eigenclass.org, it looks like this:
http://eigenclass.org/hiki.rb?c=pagerank_page
See how the nodes with no incoming (internal) links have pagerank == 0.15 == 1 – d.
You can find the implementation of the PR algorithm in the Hiki sources under misc/plugin/ or here
(It’s a bit late here, so take the following with a grain of salt) that implementation seems to compute the principal eigenvector directly by solving a linear system instead of iteratively as in your code. Pros: no need to worry about how many iterations to do. Cons: uses O(n**2) memory to represent the link matrix, numerical stability? (it uses Gaussian elimination so errors keep growing)
Just realized that I shouldn’t have called the ‘uebernode’ from the previous comment “parent node”, it’s in fact sort of the child of all other nodes.
As for challenge (2), shouldn’t the algorithm converge towards the same result (determined by the link matrix), regardless of the initial pagerank assignments? (It’s 2am and I’m tired so somebody plz check that up :)
Regarding (3), wouldn’t that be rather like Advogato’s trust metric ?
OK zzzzZZZ
googler2
Right handed whuffie to you ;-) http://en.wikipedia.org/wiki/Whuffie
sporkmonger
I’m mildly interested in why you entitled this post “Trust Metrics Mixin #1: PageRank”. One might wonder if perhaps another set of trust metrics are swirling around somewhere?
Mostly I wonder because trust metrics and feeds pretty much are all I bother to think about these days…
Jacob
Uh, _why, you have a slight mistake in your PageRank formula. See this.
why
Manfred: Congratulations. Good one with the
settle
method.sundry googlers: Prove yourselves. Answer this question correctly: how many times did you hear “Double true!” around the office this week?
mfp: Very astute. I can’t wait to hear the solution of
challenge.rb
. Literally the most amazing Ruby code I’ve ever seen.spork: Yes, there are many.
Jacob: Oh, poo. That’s the one page I forgot to check.
hgs
I hope to find time to look at this more deeply later, but isn’t this problem similar to MenTaLguY’s monadic maze explorer , in some way(s)?
MenTaLguY
hgs: Oh, hey, that’s an interesting thought. I’m not sure. Graph traversal with accumulation and bailout-upon-already-visited in both places. The short answer’s probably sort of but not totally.
One difference is that the “visited” state is stored extrinsically here—in the maze thing, it’s intrinsic to the nodes. Which I don’t actually like… I shall have to think about changing my explorer to store that extrinsically somehow.
Comments are closed for this entry.