PickPocket, a Marshal Ransack Hack
Today’s hack is a Marshal hack, which is a highly common (but quite untapped) language that also has no formal layout beyond what Ruby’s source code has to say. Most times you only hear about slight changes between major Ruby versions (1.6 -> 1.8) when something goes kaput. No Ruby books I know of go near dissecting it. And, strangely enough, Minero’s classic Ruby Hacking Guide doesn’t even touch it.
We are voodoo doctors. Take us to the center of the marsh.
Tasting Just a Few Sample Bytes
Marshal is the ultra-slim encoded bytespeak that Ruby can pull out when siphoning objects through a skinny straw. For when your Ruby shares a malt with someone else’s Ruby. Yeah, well, it’s actually very easy to pick up.
>> Marshal.dump("Koichi") => "\004\010\"\vKoichi"
Say, that’s not too bad. We dumped out a little Marshal and you can at least see plain old
Koichi in there. Other than that there are just four other characters, starting with
"\004\010", which is the Marshal 4.8 (current) marshal header.
The other characters are a quote (
"), which means “look, a string is coming up”. And
\v, which is a string length.
>> "\v" - 5 => 6
Yeah, it takes a little math, but there you can see it:
\v means a string length of 6. So, in summary: this is a Ruby 1.8.4 Marshal string containing a string with six characters and they are
Okay, so, it turns out that everything that gets Marshalled comes with these offset bytes (like
\v above) which measure strings and hashes and arrays and floats. (So does Python’s pickle and many other binary serialization formats.)
The PickPocket hack is based on these two ideas:
- You can skip through marshalled objects pretty easily.
- By slapping a header in the middle of a marsh, you can load only certain fragments.
Take this array:
>> Marshal.dump ["Goto80", "Treewave", "YMCK"] => "\004\010[\010\"\vGoto80\"\rTreewave\"\tYMCK"
This Marshal reads out loud like this:
Header, Array(3)[ String(6), String(8), String(4) ]
So, if we want to load the second element, we can do some math to find where that element lives in the marsh. Skip the header (2 bytes), skip the Array counter (2 bytes), skip the first string (2 bytes + 6 bytes)... which leaves us at position 12.
Now, will it let us load from the middle of the Array? Or what?
>> str = "\004\010[\010\"\vGoto80\"\rTreewave\"\tYMCK"[12..-1] => "\"\rTreewave\"\tYMCK" >> Marshal.load(str) TypeError: incompatible marshal file format (can't be read) format version 4.8 required; 34.13 given
Oh, wait! The header!
>> Marshal.load("\004\010" + str) => "Treewave"
Hey, klawboom!! That worked. It loaded the object and ignored anything after it.
The final part of this hack is to come up with the code for walking down into the marsh and coming up with the object we want. Here’s what I’ve got in mind.
Let’s use, as our sample corpus, a marshalled dump of the RubyGems repository. It’s of a nice, wieldsome size (2M) and it would be nice to reach in and grab one gem.
>> PickPocket(File.read('rubygems.m')).gems['hpricot-0.4-mswin32'].get => #<Gem::Specification:0x811b124 @name="hpricot" ...>
Instead of actually loading all the objects in the dump, this query is executed when the
get method is run. It’ll search the rubygems.m file for a
gems instance variable. And then it’ll search that variable for an
So far, it all fits in about a hundred lines of code: pickpocket.rb. More marshal hacking tomorrow.
Update: The RubySpec wiki has started a page on the Marshal format which looks to be a good start.