The Fully Upturned Bin #
I have something for you. I’ve just wrapped up a very sobering article on memory management in Ruby which will sure be a slap in the face to all of you raging deviants. Let’s not let this continue to be a shrouded and ignored issue.
I won’t say I’ve hit the nail on the head with this essay. It’s a brutal read: The Fully Upturned Bin.
The synopsis is:
- Quit lying to yourself, you control GC.
- Rather than leaving trash and temporary objects everywhere, try to save the pieces and throw them away at once.
- Don’t hang on to useless data, isolate its use into a scope.
- Use Ruby’s ALLOC_N in your extensions.
Thankyou. I think we’re all better now.
timsuth
GC_MALLOC_LIMIT
(~8mb) isn’t just for the first 8 megs.Have a look at
malloc_increase
andmalloc_limit
in gc.cIt means Ruby will GC whenever you have allocated more than XXX since the last GC.
XXX starts at 8mb, but changes according to:
malloc_limit += (malloc_increase - malloc_limit) * (double)live / (live plus freed);
(
live
is the number of live objects,freed
is the number we will be getting rid of, I think.)timsuth
malloc_limit
by the way.malloc_increase
is the amount you have allocated since the last GC.cilibrar
Wunderbar! This is the missing link I have been waiting for for 2 years now. Thanks so much Why, you really did the right thing, at least this time.
hgs
This provokes 2 questions from me:
1) If there are all these heaps which Ruby uses for holding objects in groups of 10_000, then would it be particularly difficult to have Heap as a native type in Ruby?
2) I’m wondering to what extent allocation comes into speed of deletions from arrays/hashes. I have on a couple of occasions reasoned that database access, or complicated maths, would be faster if I cached the results. However, using a bounded Array of Arrays and assoc(), or using a hash with a limited number of keys, it has been slower than not caching at all. Are there well known ways of doing this effectively that I have missed?
The Lua book has an interesting take on building long strings out of small ones, which I think only applies when strings are immutable, but it’s interesting nonetheless: 11.6 String buffers which uses a sort of Tower of Hanoi—you can’t put a long string on the end of a shorter one.
flgr
You could try using Symbols which are faster to compare and look up in Hashes than regular Strings. You can convert a String to a Symbol via String#to_sym and there is also a Symbol literal notation :foo or :”foo”—the latter also allows for interpolation.
why
timsuth: if people want me to go into that, I will. My point was that once you rise about eight megs, the allocation limits will change. “Ruby will begin to watch your allocation needs and resize the limit accordingly.” I don’t really understand the logic behind the
malloc_increase
computation, maybe someone can clue me in on it. Specifically, why islive / freed
important?Nikolai Weibull
You should really go into the fact that allocating a continuous region is a lot more hard on the memory manager than many small regions.
hgs: you can certainly put a long string on the end of a shorter one. The algorithm makes sure that they’ll stack properly.
I’m again going to say that the rope data structure would be really great to have in Ruby, as it encapsulates the behavior of multiple concatenations efficiently.
MenTaLguY
hgs: What would you do with a native Heap object if you had one?
flgr: Symbols are also less expensive memory-wise: if the same string values get used multiple times, each unique value is only stored in the underlying table once.
There are disadvantages, though.
One thing to be aware of is that (as far as I know), the strings in the underlying table underlying never get freed. So it’s probably best not to use Symbols for cases where you expect to be encountering a whole lot of unique strings—you’d just grow the table really fast.
The other thing is that Symbols can’t be used to represent an empty string.
_why: It’s really the whole
live / ( live + freed )
ratio that’s relevent—it reflects the percentage of objects which survived the just-completed GC pass.Multiplying by it reduces the magnitude of malloc_limit changes during periods of massive freeing (when a smaller percentage of objects have survived).
This is probably motivated by situations like those where a bunch of objects get reaped in response to (what might be) the first of a slew of new allocations. Dropping the limit drastically only to have to increase it again could have performance consequences. The ratio introduces a little bit of hysteresis.
Determining whether that is actually helpful is probably the purview of benchmarkers.
why
Good points, MenTaL. The point about symbols is worth educating. This is exactly what I’m looking for: how we’re unknowingly affecting memory, the decision we’re making by using certain approaches. We just don’t talk about it because it’s probably too timely—could change in the future.
My bet is that it won’t for some time. And, when it does, we still need to know what’s going on.
Okay, I buy the explanation of
live / (live + freed)
as a simple hysteresis. Now, how does this affect the common user? I think I’m covered by recommending that we free a big pile of objects at once rather than trickling them out.MenTaLguY
_why: Probably yeah.
Incidentally, there’s another bit you might want to cover under category #2—closures.
A closure will hang on to all the variables belonging to the scope in which it is created, including self and all of its instance variables. This is true whether or not any of those variables are actually used within the closure.
e.g.
When hanging onto parameters or local variables would create problems, two options are:
1. set such variables to nil before taking the closure:
2. define a separate method which actually takes the closure
Of course, you need to be conscious of holding onto self too. That may mean using a secondary object in much the same way as the example above uses a secondary method.
MenTaLguY
Actually, never mind a temporary object. If you don’t need to hold onto self, you can simply do:
nil.instance_eval { lambda { ... } }
Which will still capture any local variables, but not self or instance variables.
Incidentally, if you were wondering why Ruby can’t automatically close over only the variables actually used within the body of the closure, this is why:
Bloody string evals…
why
MenTaL: brilliant, yes, this all needs to be added. I should probably expand this essay into a four-pager (overview, marking, sweeping, extensions).
The “marking” section would contain stuff like the above: what does Ruby hang on to?
timsuth
As far as the “common user” is affected by
malloc_increase
and friends, I guess we can just say“The GC also runs whenever you have allocated a certain amount since the last GC.”
The “When Ruby Sweeps” section in the article currently implies that garbage collection is only run when memory runs out, or when it is triggered manually.
MenTaLguY
Is there much to be written about sweeping, really? Maybe finalization…
Oh, incidentally, if you want to see true evil with regard to reference retention, check out continuations… they necessarily retain all the selves and local variables throughout the entire call stack at the moment they are taken.
Of course unlike string evals (mutter…), I happen to be rather fond of continuations.
why
Thanks, tim. You’re absolutely right. I’ve changed the wording of the first GC condition to clear that up. I’ve also added a sentence describing that the 8 meg limit is tracked by the “internal memory counter”—lingo for
malloc_limit
.hgs
Heaps allow O(log n) searching, so I can have a quicker cache. “Programming Pearls” devotes one “column” (chapter – 14 pages) to them, so they must be good. Native so everyone has them, and for speed. And they are good for priority queues. And probably a whole bunch of things I’ve not dreamt of.
Matt
In this case “heap” just means “a big pile-o-memory”, not the data structure.
hgs
Blast! :-(
Comments are closed for this entry.