hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

Stuffing Your Hand Down the Disposal #

by why in inspect

Since we have our heads tilted in the direction of ruby+gc, Eustáquio “TaQ” Rangel posted concerns about Ruby’s garbage collector yesterday on Ruby-Talk which led to an interesting bit of code from Yohanes Santoso for watching the garbage collector slurp up variables gone from scope.

Specifically this code which creates a bunch of objects and watches them fade from existence.

 class CustObj
   attr_accessor :val, :next
   def initialize(v,n=nil)
     @val        = v
     @next       = n
   end
   def to_s
     "Object #{@val} (#{self.object_id}) points to " \
     "#{@next.nil? ? 'nothing' : @next.val} " \
     "(#{@next.nil? ? '':@next.object_id})" 
   end
 end

 def list
   print "Listing all CustObj's with ObjectSpace\n" 
   print "#{ObjectSpace.each_object(CustObj) {|v| puts v}}" \
         " objects found\n\n" 
 end

 begin        # start a new scope so we can exit it later
   c1 = CustObj.new(1,CustObj.new(2,CustObj.new(3)))
   c4 = CustObj.new(4,CustObj.new(5))
   c6 = CustObj.new(6)
   c1.next.next.next = c1        # comment this and check again
   puts "### Initial" 
   list

   c1 = nil
   c4.next = nil

   GC.start

   puts "### After gc, but still within declaring scope" 
   list
 end

 puts "### Exited the scope" 
 list

 GC.start                        # here I want c1 disappears

 puts "### After gc, outside of declaring scope" 
 list

Here’s a great script for understanding how the collector works and how important scope is to the collector. The important variable to watch is Object #1. Notice how, even after its set to nil, its object is still around because it is referenced by Object #3. And it’s still around after the scope is closed. But once the scope is closed and the GC is manually run, Object #1 disappears.

The point of this illustration isn’t to encourage you to run GC manually. It’s to encourage you to use scope to control the variables you’re hanging on to. Even if it means enclosing some stuff in begin..end.

Here’s a little watcher class, based on the above that you can use to monitor the presence of objects:

 class GCWatcher
   def initialize; @objs = []; end
   def watch( obj )
     @objs << [obj.object_id, obj.inspect]
     obj
   end
   def list
     puts "** objects watched by GcWatcher **" 
     @objs.each do |obj_id, obj_inspect|
       if ( ObjectSpace._id2ref( obj_id ) rescue nil )
         puts "#{ obj_inspect } is still around." 
       else
         puts "#{ obj_inspect } is collected." 
       end
     end
     puts "** #{ @objs.length } objects watched **" 
     puts
   end
 end

Create a GCWatcher object and fill it with references using the watch method. The object will only keep track of object IDs, so it keeps no reference to the actual objects. See a complete sample here. (Inspired by ruby-talk:147351.)

said on 08 Jul 2005 at 08:38

Just for comparison reasons, that’s how the original code looks like: http://redhanded.hobix.com/ruby-talk/147345

Then Yohanes fixed it with the begin … end stuff. And what a difference! No more lost objects there. :-)

said on 08 Jul 2005 at 14:35

Of course, you could also write GCWatcher using WeakRef:


require 'weakref'
class GCWatcher
  def initialize ; @objs = [] ; end
  def watch( obj )
    @objs << [ WeakRef::new(obj), obj.inspect ]
    obj
  end
  def list
    puts "** objects watched by GcWatcher **" 
    @objs.each do |obj_ref, obj_inspect|
      if obj_ref.weakref_alive?
        puts "#{ obj_inspect } is still around." 
      else
        puts "#{ obj_inspect } is collected." 
      end
    end
    puts "** #{ @objs.length } objects watched **" 
    puts
  end
end

IIRC internally WeakRef just does the ObjectSpace._id2ref dance too, but I think it’s slightly more readable.

said on 10 Jul 2005 at 08:49

Am confused… I thought begin...end and the like had the exact same scope as their containers (i.e. the c1 variable is available outside). What’s this craziness with the GC and the it not being so?

said on 10 Jul 2005 at 08:52

That is to say, I understand the circular reference and conservative GC bit. I just don’t understand why escaping what seems to me to be a non-existent scope should nudge the GC into cleaning it up.

said on 10 Jul 2005 at 16:44

Actually, the OP was abusing the definition of “conservative GC”. Ruby’s GC would be properly termed precise, rather than conservative.

Conservative collectors look at raw memory and assume that any data that might be a pointer is one. A precise collector is one, like Ruby’s, that knows the structure of its objects and where the references to other objects lie.

said on 10 Jul 2005 at 18:29

Yeah, I second the confusion. begin..end doesn’t seem to start a new scope, so what’s going on? Also, I’ve always heard Ruby’s collector referred to as conservative… I think even the pickaxe book says that. So what’s going on here?

said on 10 Jul 2005 at 18:54

Variables declared inside a begin..end or inside a block are block local, they perish with the close of the block.

said on 11 Jul 2005 at 06:03

I think the GC is precise when it comes to object to object references, but conservative when it comes to stack to object refs: some of these refs live in the C stack, and Ruby doesn’t know the layout of that so it has to assume that anything that looks like a pointer is one.

said on 11 Jul 2005 at 07:55

why – I thought that was the case, but it only seems to work with blocks, not begin..end

said on 12 Jul 2005 at 09:04
irb(main):001:0> x NameError: undefined local variable or method `x' for main:Object from (irb):1 irb(main):002:0> begin irb(main):003:1* x = 1 irb(main):004:1> end => 1 irb(main):005:0> x => 1
said on 12 Jul 2005 at 09:07

Sorry about that. I thought I was getting linebreaks with my ‘code’ tag. Here it is again (doesn’t this show that variables created inside begin..end survive after the end?):


irb(main):001:0> x
NameError: undefined local variable or method `x' for main:Object
        from (irb):1
irb(main):002:0> begin
irb(main):003:1* x = 1
irb(main):004:1> end
=> 1
irb(main):005:0> x
=> 1

said on 12 Jul 2005 at 11:44

Yeah, that was the exact same result I was getting. Is this something that changed with 1.9 perhaps?

said on 12 Jul 2005 at 11:49

No, I get the same results when I do it on 1.9.0.

In a message on ruby-talk, Yohanes Santoso makes a distinction between “declaring scope” and “variable scope” which I suspect has something to do with this. Anybody able to elaborate on this distinction?

said on 12 Jul 2005 at 12:54

Ok, I’m getting very sad now. When I run the CustObj code listed above I never get rid of Object #1. Even after exiting the scope and executing GC.start, it still persists. The only object that ever disappears is Object #5. What is going on here – why are we getting different results? (And while we’re at it, why wasn’t Yohanes Santoso able to reproduce TaQ’s results in the ruby-talk thread in question?)

said on 13 Jul 2005 at 09:53

First of all: don’t expect any of this to be 100% reproducible. Your C stack is probably different from mine. The above results illustrate this too.


batsman@tux-chan:/tmp$ cat gc.rb
class CustObj
    attr_accessor :val, :next
    def initialize(v,n=nil)
        @val        = v
        @next       = n
    end
    def to_s
        "Object #{@val} (#{self.object_id}) points to " \
        "#{@next.nil? ? 'nothing' : @next.val} " \
            "(#{@next.nil? ? '':@next.object_id})" 
    end
end
def list
    print "Listing all CustObj's with ObjectSpace\n" 
    print "#{ObjectSpace.each_object(CustObj) {|v| puts v}}" \
        " objects found\n\n" 
end
def b
    begin        # start a new scope so we can exit it later
        c1 = CustObj.new(1,CustObj.new(2,CustObj.new(3)))
        c4 = CustObj.new(4,CustObj.new(5))
        c6 = CustObj.new(6)
        c1.next.next.next = c1        # comment this and check again
        puts "### Initial" 
        list

        c1 = nil
        c4.next = nil

        GC.start

        puts "### After gc, but still within declaring scope" 
        list
    end

    puts "### I shall die now if begin/end introduces a new scope" 
    a, b = c1, c6  # proves that the vars aren't dead here
    puts "### Well, I guess there's no new scope after all" 

    puts "### Exited the non-scope" 
    list

    GC.start                        # bleh

    puts "### After gc, outside of declaring scope" 
    list
end
b
# < < EOF => mangled msg
# <<-EOF works just fine
puts <<-EOF
Whatever objects were alive before will remain here: there's
something in the C stack that *seems* to point to them. Ruby's GC
*is* conservative, so there is no guarantee that the objects will be
collected (most will, eventually), let alone reclaimed in a timely
manner.
EOF
GC.start                        # die
list
batsman@tux-chan:/tmp$ ruby gc.rb
### Initial
Listing all CustObj's with ObjectSpace
Object 6 (537823986) points to nothing ()
Object 4 (537823996) points to 5 (537824006)
Object 5 (537824006) points to nothing ()
Object 1 (537824016) points to 2 (537824026)
Object 2 (537824026) points to 3 (537824036)
Object 3 (537824036) points to 1 (537824016)
6 objects found
### After gc, but still within declaring scope
Listing all CustObj's with ObjectSpace
Object 6 (537823986) points to nothing ()
Object 4 (537823996) points to nothing ()
Object 1 (537824016) points to 2 (537824026)
Object 2 (537824026) points to 3 (537824036)
Object 3 (537824036) points to 1 (537824016)
5 objects found
### I shall die now if begin/end introduces a new scope
### Well, I guess there's no new scope after all
### Exited the non-scope
Listing all CustObj's with ObjectSpace
Object 6 (537823986) points to nothing ()
Object 4 (537823996) points to nothing ()
Object 1 (537824016) points to 2 (537824026)
Object 2 (537824026) points to 3 (537824036)
Object 3 (537824036) points to 1 (537824016)
5 objects found
### After gc, outside of declaring scope
Listing all CustObj's with ObjectSpace
Object 6 (537823986) points to nothing ()
Object 4 (537823996) points to nothing ()
2 objects found
Whatever objects were alive before will remain here: there's
something in the C stack that *seems* to point to them. Ruby's GC
*is* conservative, so there is no guarantee that the objects will be
collected (most will, eventually), let alone reclaimed in a timely
manner.
Listing all CustObj's with ObjectSpace
Object 6 (537823986) points to nothing ()
Object 4 (537823996) points to nothing ()
2 objects found

said on 13 Jul 2005 at 10:21

Okay, so the begin..end doesn’t introduce its own local variables. It doesn’t leverage local_push. Is it effective in clearing GC, though? And why? Look through Ruby’s source, it looks like the begin..end is just a collection of nodes, nothing more.

said on 13 Jul 2005 at 10:33

Can someone verify that Ruby does in fact walk the C stack? I’d be surprised if it did, as that would seriously reduce its portability…

said on 13 Jul 2005 at 15:49
MenTaLguY: it has to look at the C stack (and mark the objects pointed to by the registers too). Otherwise, writing extensions would be quite a PITA . Read garbage_collect() around:

#if STACK_GROW_DIRECTION < 0
    rb_gc_mark_locations((VALUE*)STACK_END, rb_gc_stack_start);
#elif STACK_GROW_DIRECTION > 0
    rb_gc_mark_locations(rb_gc_stack_start, (VALUE*)STACK_END + 1);
#else
    if ((VALUE*)STACK_END < rb_gc_stack_start)
        rb_gc_mark_locations((VALUE*)STACK_END, rb_gc_stack_start);
    else
        rb_gc_mark_locations(rb_gc_stack_start, (VALUE*)STACK_END + 1);
#endif

said on 14 Jul 2005 at 16:29

Ahh, so it does. And indeed I suspect that is the reason for all the various interesting GC behavior being described here.

said on 15 Jul 2005 at 11:11

We can easily go beyond the mere suspicion:


batsman@tux-chan:/tmp$ cat gc.rb
class CustObj
    attr_accessor :val, :next
    def initialize(v,n=nil)
        @val        = v
        @next       = n
    end
    def to_s
        "Object #{@val} (#{self.object_id}) points to " \
        "#{@next.nil? ? 'nothing' : @next.val} " \
            "(#{@next.nil? ? '':@next.object_id})" 
    end
end
def list
    print "Listing all CustObj's with ObjectSpace\n" 
    print "#{ObjectSpace.each_object(CustObj) {|v| puts v}}" \
        " objects found\n\n" 
end
def b
    begin   # start a new scope so we can exit it later
        c4 = CustObj.new(4,CustObj.new(5))
        c6 = CustObj.new(6)
        c1 = CustObj.new(1,CustObj.new(2,CustObj.new(3)))
        c1.next.next.next = c1     # comment this and check again
        puts "### Initial" 
        list

  (2**100).abs # just to set the breakpoint

        c1 = nil
        c4.next = nil

        puts "### ABOUT TO GC!!" 

        GC.start

        puts "### After gc, but still within declaring scope" 
        list
    end

   puts "### Exited the non-scope" 
    list

    GC.start                     # here I want c1 to disappear

    puts "### After gc, outside of declaring non-scope" 
    list
end
b

Let’s run that under gdb:


batsman@tux-chan:/tmp$ gdb ~/usr/bin/gruby
(gdb) set args gc.rb

We set a breakpoint at rb_big_abs in order to get the object_id of the first object and set a conditional breakpoint based on that:


(gdb) break rb_big_abs
Breakpoint 1 at 0x80c72e9: file bignum.c, line 1994.
(gdb) run
Starting program: /home/batsman/usr/bin/gruby gc.rb
### Initial
Listing all CustObj's with ObjectSpace
Object 1 (537824326) points to 2 (537824336)
Object 2 (537824336) points to 3 (537824346)
Object 3 (537824346) points to 1 (537824326)
Object 6 (537824356) points to nothing ()
Object 4 (537824366) points to 5 (537824376)
Object 5 (537824376) points to nothing ()
6 objects found
Breakpoint 1, rb_big_abs (x=1075647532) at bignum.c:1994
1994        if (!RBIGNUM(x)->sign) {
(gdb) list gc_mark
705
706     void
707     gc_mark(ptr, lev)
708         VALUE ptr;
709         int lev;
710     {
711         register RVALUE *obj;
712
713         obj = RANY(ptr);
714         if (rb_special_const_p(ptr)) return; /* special const not marked */

Now we can see when the first object is about to be marked:


(gdb) break gc_mark
Breakpoint 2 at 0x806e0f3: file gc.c, line 713.
(gdb) cond 2 ptr == 2 * 537824326
(gdb) cont
Continuing.
### ABOUT TO GC!!
Breakpoint 2, gc_mark (ptr=1075648652, lev=0) at gc.c:713
713         obj = RANY(ptr);

Breakpoint reached. Let’s see what caused that object to be marked (i.e. where the reference came from):


(gdb) bt
#0  gc_mark (ptr=1075648652, lev=0) at gc.c:713
#1  0x0806dfb9 in mark_locations_array (x=0xbfffe188, n=1395) at gc.c:626
#2  0x0806dfe7 in rb_gc_mark_locations (start=0xbfffde08, end=0xbffff758) at gc.c:639
#3  0x0806f0e6 in garbage_collect () at gc.c:1356
[...]

Alright, there was a ref in the C stack. Let’s cont until the current stack frame returns…


(gdb) finish
Run till exit from #0  gc_mark (ptr=1075648652, lev=0) at gc.c:713
Breakpoint 2, gc_mark (ptr=1075648652, lev=3) at gc.c:713
713         obj = RANY(ptr);
(gdb) bt
#0  gc_mark (ptr=1075648652, lev=3) at gc.c:713
#1  0x0806e001 in mark_entry (key=10274, value=1075648652, lev=3) at gc.c:648
#2  0x080af3d7 in st_foreach (table=0x812c688, func=0x806dfe9 <mark_entry>, arg=3) at st.c:496
#3  0x0806e030 in mark_tbl (tbl=0x812c688, lev=3) at gc.c:658
#4  0x0806e51e in gc_mark_children (ptr=1075648692, lev=3) at gc.c:948
#5  0x0806e195 in gc_mark (ptr=1075648692, lev=2) at gc.c:731
#6  0x0806e001 in mark_entry (key=10274, value=1075648692, lev=2) at gc.c:648
#7  0x080af3d7 in st_foreach (table=0x812c700, func=0x806dfe9 <mark_entry>, arg=2) at st.c:496
#8  0x0806e030 in mark_tbl (tbl=0x812c700, lev=2) at gc.c:658
#9  0x0806e51e in gc_mark_children (ptr=1075648672, lev=2) at gc.c:948
#10 0x0806e195 in gc_mark (ptr=1075648672, lev=1) at gc.c:731
#11 0x0806e001 in mark_entry (key=10274, value=1075648672, lev=1) at gc.c:648
#12 0x080af3d7 in st_foreach (table=0x812c778, func=0x806dfe9 <mark_entry>, arg=1) at st.c:496
#13 0x0806e030 in mark_tbl (tbl=0x812c778, lev=1) at gc.c:658
#14 0x0806e51e in gc_mark_children (ptr=1075648652, lev=1) at gc.c:948
#15 0x0806e195 in gc_mark (ptr=1075648652, lev=0) at gc.c:731
#16 0x0806dfb9 in mark_locations_array (x=0xbfffe188, n=1395) at gc.c:626
#17 0x0806dfe7 in rb_gc_mark_locations (start=0xbfffde08, end=0xbffff758) at gc.c:639
#18 0x0806f0e6 in garbage_collect () at gc.c:1356
[...]

Isn’t this beautiful? obj1@1075648652 references obj2@1075648672, which is marked in gc_mark_children, which iterates over the entries in the iv_tbl. Likewise, obj2@1075648672 points to obj3@1075648692, which points back to obj1@1075648652. This is why we ran into the gc_mark breakpoint for the second time (!).


(gdb) cont
Continuing.
### After gc, but still within declaring scope
Listing all CustObj's with ObjectSpace
Object 1 (537824326) points to 2 (537824336)
Object 2 (537824336) points to 3 (537824346)
Object 3 (537824346) points to 1 (537824326)
Object 6 (537824356) points to nothing ()
Object 4 (537824366) points to nothing ()
5 objects found
### Exited the non-scope
Listing all CustObj's with ObjectSpace
Object 1 (537824326) points to 2 (537824336)
Object 2 (537824336) points to 3 (537824346)
Object 3 (537824346) points to 1 (537824326)
Object 6 (537824356) points to nothing ()
Object 4 (537824366) points to nothing ()
5 objects found
### After gc, outside of declaring non-scope
Listing all CustObj's with ObjectSpace
Object 6 (537824356) points to nothing ()
Object 4 (537824366) points to nothing ()
2 objects found
Program exited normally.
(gdb) quit

Comments are closed for this entry.