DragonFly's Freezer

February 27th 15:19
by why

People tend to dismiss DragonFlyBSD as the pipe dream of Matt Dillon, discounting all the work Matt and his little team have done. DragonFly is my favorite OS and it works great for me. Virtual kernels are nicer than jails (in my opinion) and are becoming much easier thanks to stuff like the vkernel manager. Pkgsrc integration is getting much better (try: man pkg_radd) and checkpointing is such a juicy feature. And, of course, we’re all excited about HAMMER.

The dfly 1.12 release was yesterday. And I guess it’s not much of a big deal compared to what 2.0 will be. But if you’re new to dfly, then you need some time to catch up anyway. (Look, just play with it. Run the 1.12 iso under Qemu 0.9.1. These instructions still apply.)


But let’s get back to checkpointing. Awhile ago, I started working on improvements to Try Ruby. To allow users to save and resume their sessions. A perfect job for checkpointing.

Here’s the deal. On DragonFly, you can send SIGCKPT to a process. And that process will be frozen to disk. It saves as an executable binary. You run the binary and the app is back. Similar to how the Terminator was back, if that helps at all. (On Linux, you can use CryoPID.)

After some searching, I found some broken links to a lib that Michael Neumann had scrabbled together. A checkpointing API for Ruby. Believe it! He sent me his work by e-mail and I added some auto-gzip support to it.

svn co http://code.whytheluckystiff.net/svn/dfly/trunk dfly

Checkpointing as an API call is just marvellous. In a way, it’s sort of treated like forking. Because when you fork, you return from the same method twice: once in the child process and once in the parent. And each scenario gets a different return value.

You call Dfly.checkpoint("test.ckpt.gz") to save a snapshot. And the checkpoint either returns Dfly::FROZEN or Dfly::RESUMED. So, Dfly::FROZEN is returned after the snapshot is saved. And Dfly::RESUMED is returned if the app is just now coming back into animation.

So like, imagine:

require 'dfly'
puts "THIS APP IS GOING TO FREEZE"
case Dfly.checkpoint("test.ckpt.gz")
when Dfly::FROZEN
  exit
when Dfly::RESUMED
  puts "THIS APP IS THAWED OUT"
end

Alternatively, you can give Dfly.checkpoint a block that runs only when the app is resumed. Great for re-opening sockets and the like. This block was Michael Neumann’s idea in the first place, so give him the extra bonus lives and carnation wreaths.

To resume: Dfly.resume("test.ckpt.gz"). Which can be done from another script entirely and the process will jump back to when it was first clubbed on the head.

12 comments

andre

said on February 27th 21:21

I guess in the long run you’ll be able to run a dfly cluster and move processes around machines by checkpointing them and restarting somewhere else. That would be way cool.

why, sorry to make a support request here, but hoodwink.d seems to be broken for my site (sneakymustard.com): ActiveRecord::StatementInvalid Mysql::Error: Column ‘site_id’ cannot be null: INSERT INTO hoodwinkd_posts (`permalink`, `site_id`, `title`, `last_wink_id`, `layer_id`, `wink_count`, `created_at`, `first_wink_id`) VALUES (‘/2008/02/25’, NULL, ‘’, 0, 931, 0, ’2008-02-27 21:16:47’, 0). Do you think you could have a look at that?

ryantm

said on February 27th 22:54

What are the long term performance considerations of this? What if you want to upgrade your ruby version running, or inject some new code?

<|:{

said on February 27th 23:07

Ah, this is wonderful… and just in time for Spring!

_why

said on February 27th 23:56

andre: I don’t think you can thaw a checkpointed app out on a different machine under DragonFly. I guess it’s been several months since I’ve tried. I think you can with CryoPID, though.

ryantm: Upgrading the checkpointed app? No, well, bad news: unfortunately, checkpointing doesn’t just magically turn everything into Smalltalk.

technomancy

said on February 27th 23:57

does this mean I don’t have to learn how continuations work?

Jesse

said on February 28th 02:07

Andre: I’ve noticed that problem recently posting comments against any url that ends with a slash. You know, like “domain.com/dir” instead of “domain.com/page.html”. To whit, I can’t comment on homestarrunner or xkcd.

Oh yeah, and it’s lonely out there, unless you all stuck me in a satelite office. :)

defunkt

said on February 28th 11:36

I didn’t really understand until I re-read the Terminator reference. It’s all so clear now!

Also, this is really great. It makes me want to install dfly to check it out. Dfly.checkpoint(“user.#{user.id}.ckpt.gz”) on top of something simple like Camping or Sinatra might be amazing.

andre

said on February 28th 13:39

why: not now, but I believe this is planned for the future… when there’s a global FS and something to deal with cluster-wide process numbers, files descriptors, etc… I mean, dfly’s end goal is to provide SSI, so it should be possible one day.

But we’re still seeing the beginning of it all.

Carsten

said on February 28th 16:24

Re checkpointing looking like fork(): In 1994, we wrote:

The dump primitive of Elk differs from existing, similar mechanisms in that the newly created executable, when called, starts at the point where dump was called in the original invocation (as opposed to the program’s main entry point). Here the return value of dump is ``true’’, while in the original invocation it returns ``false’’—not unlike the UNIX fork system call.

It worked great for my thesis project in 1990, when it would have taken minutes to start up an editor with all the initializations required. Of course, the underlying code was inspired by Emacs.

_why

said on February 29th 00:03

Very cool citation, Carsten. It’s good to see the idea’s still moving along.

Campzilla

said on February 29th 15:27

Its like plugging an Action Replay MK VI cartridge into ruby. Poke 53222, $EA for infinite stack!

http://upload.wikimedia.org/wikipedia/commons/thumb/0/03/Action.JPG/200px-Action.JPG

http://everything2.com/index.pl?node_id=1294633

Nathan de Vries

said on March 3rd 05:15

It’s strange how people start talking about things you’ve recently been thinking about. I was recently talking to some friends at work about how cool it would be to use continuations for debugging purposes . Instead of sending out an email with a stacktrace & a bunch of context info, why not create a continuation which may later be used to recreate the problematic environment.

Being able to do process hibernation would be pretty damn cool in that scenario (imagine getting a loadable application_error.tgz?), let alone in a Seaside-like sessions-are-continuations web framework.

Comments are closed for this entry.