DragonFly's Freezer
People tend to dismiss DragonFlyBSD as the pipe dream of Matt Dillon, discounting all the work Matt and his little team have done. DragonFly is my favorite OS and it works great for me. Virtual kernels are nicer than jails (in my opinion) and are becoming much easier thanks to stuff like the vkernel manager. Pkgsrc integration is getting much better (try: man pkg_radd
) and checkpointing is such a juicy feature. And, of course, we’re all excited about HAMMER.
The dfly 1.12 release was yesterday. And I guess it’s not much of a big deal compared to what 2.0 will be. But if you’re new to dfly, then you need some time to catch up anyway. (Look, just play with it. Run the 1.12 iso under Qemu 0.9.1. These instructions still apply.)
But let’s get back to checkpointing. Awhile ago, I started working on improvements to Try Ruby. To allow users to save and resume their sessions. A perfect job for checkpointing.
Here’s the deal. On DragonFly, you can send SIGCKPT to a process. And that process will be frozen to disk. It saves as an executable binary. You run the binary and the app is back. Similar to how the Terminator was back, if that helps at all. (On Linux, you can use CryoPID.)
After some searching, I found some broken links to a lib that Michael Neumann had scrabbled together. A checkpointing API for Ruby. Believe it! He sent me his work by e-mail and I added some auto-gzip support to it.
svn co http://code.whytheluckystiff.net/svn/dfly/trunk dfly
Checkpointing as an API call is just marvellous. In a way, it’s sort of treated like forking. Because when you fork, you return from the same method twice: once in the child process and once in the parent. And each scenario gets a different return value.
You call Dfly.checkpoint("test.ckpt.gz")
to save a snapshot. And the checkpoint
either returns Dfly::FROZEN
or Dfly::RESUMED
. So, Dfly::FROZEN
is returned after the snapshot is saved. And Dfly::RESUMED
is returned if the app is just now coming back into animation.
So like, imagine:
require 'dfly'
puts "THIS APP IS GOING TO FREEZE"
case Dfly.checkpoint("test.ckpt.gz")
when Dfly::FROZEN
exit
when Dfly::RESUMED
puts "THIS APP IS THAWED OUT"
end
Alternatively, you can give Dfly.checkpoint
a block that runs only when the app is resumed. Great for re-opening sockets and the like. This block was Michael Neumann’s idea in the first place, so give him the extra bonus lives and carnation wreaths.
To resume: Dfly.resume("test.ckpt.gz")
. Which can be done from another script entirely and the process will jump back to when it was first clubbed on the head.
andre
I guess in the long run you’ll be able to run a dfly cluster and move processes around machines by checkpointing them and restarting somewhere else. That would be way cool.
why, sorry to make a support request here, but hoodwink.d seems to be broken for my site (sneakymustard.com):
ActiveRecord::StatementInvalid Mysql::Error: Column ‘site_id’ cannot be null: INSERT INTO hoodwinkd_posts (`permalink`, `site_id`, `title`, `last_wink_id`, `layer_id`, `wink_count`, `created_at`, `first_wink_id`) VALUES (‘/2008/02/25’, NULL, ‘’, 0, 931, 0, ’2008-02-27 21:16:47’, 0)
. Do you think you could have a look at that?ryantm
What are the long term performance considerations of this? What if you want to upgrade your ruby version running, or inject some new code?
<|:{
Ah, this is wonderful… and just in time for Spring!
_why
andre: I don’t think you can thaw a checkpointed app out on a different machine under DragonFly. I guess it’s been several months since I’ve tried. I think you can with CryoPID, though.
ryantm: Upgrading the checkpointed app? No, well, bad news: unfortunately, checkpointing doesn’t just magically turn everything into Smalltalk.
technomancy
does this mean I don’t have to learn how continuations work?
Jesse
Andre: I’ve noticed that problem recently posting comments against any url that ends with a slash. You know, like “domain.com/dir” instead of “domain.com/page.html”. To whit, I can’t comment on homestarrunner or xkcd.
Oh yeah, and it’s lonely out there, unless you all stuck me in a satelite office. :)
defunkt
I didn’t really understand until I re-read the Terminator reference. It’s all so clear now!
Also, this is really great. It makes me want to install dfly to check it out. Dfly.checkpoint(“user.#{user.id}.ckpt.gz”) on top of something simple like Camping or Sinatra might be amazing.
andre
why: not now, but I believe this is planned for the future… when there’s a global FS and something to deal with cluster-wide process numbers, files descriptors, etc… I mean, dfly’s end goal is to provide SSI, so it should be possible one day.
But we’re still seeing the beginning of it all.
Carsten
Re checkpointing looking like fork(): In 1994, we wrote:
It worked great for my thesis project in 1990, when it would have taken minutes to start up an editor with all the initializations required. Of course, the underlying code was inspired by Emacs.
_why
Very cool citation, Carsten. It’s good to see the idea’s still moving along.
Campzilla
Its like plugging an Action Replay MK VI cartridge into ruby. Poke 53222, $EA for infinite stack!
http://upload.wikimedia.org/wikipedia/commons/thumb/0/03/Action.JPG/200px-Action.JPG
http://everything2.com/index.pl?node_id=1294633
Nathan de Vries
It’s strange how people start talking about things you’ve recently been thinking about. I was recently talking to some friends at work about how cool it would be to use continuations for debugging purposes . Instead of sending out an email with a stacktrace & a bunch of context info, why not create a continuation which may later be used to recreate the problematic environment.
Being able to do process hibernation would be pretty damn cool in that scenario (imagine getting a loadable application_error.tgz?), let alone in a Seaside-like sessions-are-continuations web framework.
Comments are closed for this entry.