RedHanded » Trust Metrics Mixin #1: PageRank

Trust Metrics Mixin #1: PageRank #

We get on these little kicks. As we’re all snooping around. Little protocols or obscure version control systems. Or domino games or something.

Trust metric, man. It just strikes me all the time: I imagine everyone in the world with their hands cupped, a little ball of clay in the middle. Everyone gets a moment to divide up their ball into pieces for selected friends and relatives. Trick is they have to divide up the whole ball. They can’t hang on to any. Clay is redistributed. One guy has a big monolith! People, don’t give him any more! (Unless, of course, he really can be trusted with clay.)

Here’s today’s challenge. You are given objects which respond to:

trust(*objs): Assigns one unit of trust in an arc from self to each of the objs.
trusts?(obj): Tests self for any trust of obj.
trust_spread: The total number of trusted objects.
trust_sphere: All objects ranked in the system.

Given these objects, provide a mixin which determines a ranking. In today’s case, we’ll be mixing in a pagerank method, which approximates rank based on a primitive form of Google’s algorithm.

 module GoogleTrust
   DAMPEN = 0.85
   def pagerank(ranks = {})
     return ranks[self] if ranks.has_key? self
     ranks[self] = @rank || 0.0
     @rank =
       (1 - DAMPEN) +
       (DAMPEN * trust_sphere.inject(0) do |n, (o, c)|
         n + (o.trusts?(self) ? o.pagerank(ranks) / o.trust_spread: 0)
       end)
   end
 end

This algorithm is an approximation of Google’s PageRank, as described in PageRank Explained Correctly with Examples by some third party. Regardless, the basic equation can be seen throughout the web and I will refer you to the paper above in sorting out exactly how PageRank actually works.

You probably get the jist, though: by linking to sites, you are extending a unit of trust to them. But also: the value of your link is also based on the links you are receiving. So, it’s sort of like a tent, ranks evolve from a gelatinous blob into a more rigid and beautifully useful form.

Let’s get some sites running our PageRank. I’ll use some blogs from the Ruby world:

 class Site
   include GoogleTrust

   @@sites = []
   attr_accessor :name

   def initialize(name)
     @name = name
     @arcs = {}
     @@sites.push self
   end

   def trust *sites
     sites.each { |site| @arcs[site] = (@arcs[site] || 0) + 1 }
   end

   def trusts? site;     @arcs.has_key? site end
   def trust_worth site; @arcs[site]         end
   def trust_spread;     @arcs.length        end
   def trust_sphere;     @@sites             end
 end

 red = Site.new('RedHanded')
 eig = Site.new('Eigenclass.org')
 rails = Site.new('Ruby on Rails')
 ana = Site.new('Anarchaia')
 prj = Site.new('Project.ioni.st')

So, sites are represented by a name and an arc of other objects, representing sites which have been linked to. Here are five blogs, let’s make a web.

 red.trust   rails, eig, ana
 eig.trust   ana
 rails.trust prj, eig
 ana.trust   prj, red, rails, eig
 prj.trust   red, rails, eig, ana

 40.times do |i|
   puts "--- PASS #{i} ---" 
   red.trusted_sphere.each do |site|
     puts "#{ site.name }: #{ site.pagerank }" 
   end
 end

The pagerank is recursive, it computes the rank of a site based on the ranks of other sites, which are in turn based on its own rank. (Unless the links happen to manifest a heirarchy or a tree shape.) So, the ranking will change during the first many trials of running the metric. After you get to about the fifth pass, things settle down.

Anyway, note the rankings and see if you can work it out. After the first set, try adding some other sites into the mix:

 pj = Site.new('PJ Hyett')
 tf = Site.new('Thomas Fuchs')
 matz = Site.new('Journal of Matz')

 red.trust matz, tf, pj
 eig.trust matz
 rails.trust tf
 ana.trust pj, matz
 matz.trust rails, eig
 tf.trust rails
 pj.trust rails, ana, prj

 10.times do
   puts "---" 
   red.trust_sphere.each do |site|
     puts "#{ site.name }: #{ site.pagerank }" 
   end
 end

Other challenges applicable to this metric:

I’ve ignored the trust_worth method thus far. What if a site is linked many times? Alter the mixin to allow the units to positively influence the metric.
All sites in the above equation start out equal. Try seeding the rankings. Maybe you want trust from Matz to be inherently more valuable, to add some kind of hierarchy to the mix.
This trust metric computes a wholistic figure for ranking. In other words: How does this site rank in terms of the entire web? Try altering the fundamental question to: How does this site rank in terms of just Project.ioni.st’s view? This way, we can determine how sites not directly trusted by Project.ioni.st relate to it.

For extra points, if you are employed at Google, debunk my algorithm and leak us something sweet! (Complete code is in metric-pagerank.rb.)

11 Jan 2006 at 16:06 | 12 comments

Manfred

said on 11 Jan 2006 at 16:38

Right now you iterate a fixed number of times over the pagerank method of a site until you come to a arbitrary stability of the network. If you use the same constant for large networks, it’s not very likely that the values have stabilized.

You could add another method to site to run until a certain level of rigidness in the network is reached.

why

said on 11 Jan 2006 at 16:44

Yes, totally. Let’s consider that the fourth challenge, Manfred.

Also, I’m sure many of you will wonder, “So, what’s the maximum PageRank in this system? Is this ranked between 0 and 2?” Well, yes, I suppose. But the fundamental idea here is that if you add all the ranks together and average them out, you’ll get 1. So you grade them along the rankings which emerge.

Manfred

said on 11 Jan 2006 at 17:33

Elasticity implemented

Please don’t mind the lack of elegance, I have to sleep now (.

googler

said on 11 Jan 2006 at 17:59

While I can’t tell you anything, I can say that I read this blog and enjoy it!

mfp

said on 11 Jan 2006 at 18:29

I think the maximum PageRank given a network of N nodes is (1-d) + d(1-d)N which is attained when there is a single node pointed to by all the others (which have no incoming links). For example, for a network of 1000 nodes where 999 point to the parent one, the maximum PR would be 127.5225. Note that in this case the average PR is not 1, but 0.2773725, because the PR of the parent node is going to waste (since it has no outgoing links).

Also, the minimum PR is going to be (1-d), never 0.

mfp

said on 11 Jan 2006 at 19:16

I forgot to say that hiki ships with a PageRank plugin. Applied to eigenclass.org, it looks like this:

http://eigenclass.org/hiki.rb?c=pagerank_page

See how the nodes with no incoming (internal) links have pagerank == 0.15 == 1 – d.

You can find the implementation of the PR algorithm in the Hiki sources under misc/plugin/ or here

(It’s a bit late here, so take the following with a grain of salt) that implementation seems to compute the principal eigenvector directly by solving a linear system instead of iteratively as in your code. Pros: no need to worry about how many iterations to do. Cons: uses O(n**2) memory to represent the link matrix, numerical stability? (it uses Gaussian elimination so errors keep growing)

Just realized that I shouldn’t have called the ‘uebernode’ from the previous comment “parent node”, it’s in fact sort of the child of all other nodes.

As for challenge (2), shouldn’t the algorithm converge towards the same result (determined by the link matrix), regardless of the initial pagerank assignments? (It’s 2am and I’m tired so somebody plz check that up :)

Regarding (3), wouldn’t that be rather like Advogato’s trust metric ?

OK zzzzZZZ

googler2

said on 11 Jan 2006 at 19:56

Right handed whuffie to you ;-) http://en.wikipedia.org/wiki/Whuffie

sporkmonger

said on 11 Jan 2006 at 20:52

I’m mildly interested in why you entitled this post “Trust Metrics Mixin #1: PageRank”. One might wonder if perhaps another set of trust metrics are swirling around somewhere?

Mostly I wonder because trust metrics and feeds pretty much are all I bother to think about these days…

Jacob

said on 12 Jan 2006 at 00:46

Uh, _why, you have a slight mistake in your PageRank formula. See this.

why

said on 12 Jan 2006 at 01:22

Manfred: Congratulations. Good one with the settle method.

sundry googlers: Prove yourselves. Answer this question correctly: how many times did you hear “Double true!” around the office this week?

mfp: Very astute. I can’t wait to hear the solution of challenge.rb. Literally the most amazing Ruby code I’ve ever seen.

spork: Yes, there are many.

Jacob: Oh, poo. That’s the one page I forgot to check.

hgs

said on 12 Jan 2006 at 04:49

I hope to find time to look at this more deeply later, but isn’t this problem similar to MenTaLguY’s monadic maze explorer , in some way(s)?

MenTaLguY

said on 12 Jan 2006 at 10:21

hgs: Oh, hey, that’s an interesting thought. I’m not sure. Graph traversal with accumulation and bailout-upon-already-visited in both places. The short answer’s probably sort of but not totally.

One difference is that the “visited” state is stored extrinsically here—in the maze thing, it’s intrinsic to the nodes. Which I don’t actually like… I shall have to think about changing my explorer to store that extrinsically somehow.

Comments are closed for this entry.

Archive

Links

Syndicate

Trust Metrics Mixin #1: PageRank #

Manfred

why

Manfred

googler

mfp

mfp

googler2

sporkmonger

Jacob

why

hgs

MenTaLguY