hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

Injecting a Hash Backwards and the Merge Block #

by why in inspect

Here’s a fun snippet cooked up for the Camping 1.3 release due later today, laid out a bit nicer so you can play with it on your own.

Goal is: to parse a query string in as few bytes as possible. And to allow the Hash-like syntax from PHP and Rails.

 def qs_parse(qs)
   qs.split(/[&;]/n).
      inject({}) { |h,p| 
        k, v = p.split('=',2)
        h.merge(
          k.split(/[\]\[]+/).reverse.
            inject(v) { |x,i| {i=>x} }
        ){|_,o,n|o.merge(n)}
      }
 end

Believe me, there’s a real zen in the inner-inject/merge-block routine. Wield it like so:

 >> qs_parse("name=Philarp+Tremain&hair=sandy+blonde")
 => {"name"=>"Philarp+Tremain", "hair"=>"sandy+blonde"}
 >> qs_parse("post[id]=4&post[nick]=_why&post[message]=GROSS!&a=1")
 => {"a"=>"1", "post"=>{"message"=>"GROSS!", "nick"=>"_why", "id"=>"4"}}

Obviously, in the final version, stuff gets unescaped and all that. This exercise only focuses on parsing the structure. It would be nice for the merge-block to be tail recursive. It only goes one level presently.

Update: Here’s a right-good one which recurses to any depth and builds an array from duplicate entries as Dan pointed out.

 def qs_parse(qs)
   m = proc {|_,o,n|o.merge(n,&m)rescue(o.to_a<<n)}
   qs.split(/[&;]/n).
      inject({}) { |h,p| 
        k, v = p.split('=',2)
        h.merge(
          k.split(/[\]\[]+/).reverse.
            inject(v) { |x,i| {i=>x} },&m)
      }
 end
said on 25 Jan 2006 at 13:39

oooh! Pretty!

said on 25 Jan 2006 at 13:46

Hmm… > _.class => NilClass

said on 25 Jan 2006 at 14:03

_ is just a variable name, best used as a place holder in tiny code. it is used above elegantly with the block-enabled Hash.merge to handle cases where you have two hashes with the same name, but with their own values. {|key,oldval,newval| oldval.merge(newval)} blends the two hashes from the query string

so foo[bar]=baz and foo[qux]=bop ends up being the full foo={bar=>baz, qux=>bop}

man yeah i’d love to see some arbitrarily deep hash support (_why, you should have posted this at 4:20 hehe)

mmmmmmmmmm

said on 25 Jan 2006 at 15:07

The ‘n’ option on the regex is what? Multibyte aware or something?

said on 25 Jan 2006 at 15:17

fansipans: I’m with you. This post was underorchestrated.

Harold: Yes, another byte gone. GLAD .

said on 25 Jan 2006 at 15:44

you can just have

k,v = p.split '='

and the other split elements will be discarded.

also, unless you want to match

post[[id]]=4

or something similarly perverse, the inner split’s regex can be

/[\]\[]/

without the plus sign (or /[][]/ if you don’t mind warnings).

lee

said on 25 Jan 2006 at 16:13

Why do you write the code so tersly? Is it to keep the file size down? It makes it really hard for beginners to follow what is going on. I know I am asking a lot, but could you post an expanded version (full variable names, generous white space) for these short expositions? I think I could learn more that way.

said on 25 Jan 2006 at 16:23

Hmm, I thought the URI class had some helpers for this already, but I don’t see anything.

Perhaps URI could use some polish?

# Ideal syntax (?)
uri = URI.parse(qs)
uri.query.name # Philarp Tremain
uri.query.hair # Sandy Blonde
I guess we’ll need a URI ::Query class first, eh?
said on 25 Jan 2006 at 17:13

????: There’s an expanded version in camping svn.

said on 25 Jan 2006 at 17:56

Golfing code is meant to challenge. It’s like Sudoku or Rubik, but the kick of it is that you’re actually left with something handy.

Daniel: There’s a nice thought. I’m sure it could wipe out some code in WEBrick, too.

said on 25 Jan 2006 at 19:55

It barfs on multiple values for the same name, which is legal HTML :

qs_parse(“foo=4&foo=6”)

said on 26 Jan 2006 at 00:28

Okay, updated to be recursive and handle the bug Dan found.

said on 26 Jan 2006 at 02:36

fansipans: Turns out _ is something irb adds to confuse me…

said on 26 Jan 2006 at 07:22

Nice fix! On a separate note, I’m a Java developer (boo, hiss) who’s really excited about Ruby. But one of the things I don’t like is when it becomes too ‘Perl-like’ (read: impenetrable, brevity over clarity), and I’m wondering if you see this code snippet as just an exercise, or whether you would write code this way on a large project meant to be shared with other developers. It’s probably just my inexperience with Ruby talking, but if I had to debug this or add a feature to it, I would be well screwed. So, is this meant to be poetry? As that, it’s wonderful.

said on 26 Jan 2006 at 11:05

I admit it took me awhile to parse through that code mentally. The irony, though, is that despite it’s almost Perlish density, the functional programming crowd would probably consider it not merely elegant, but the “right” way to do it. And not without reason, either. For all it’s complexity, it’s not relying on any special rules or wierd syntactic truicks like Perl might. Although it would probably be a little prettier in Haskell, and the ‘inject’s would be called ‘fold’.

This isn’t really an opionion one way or another; just an observation that the Functional and Scripting worlds are getting closer and closer together.

said on 26 Jan 2006 at 12:27

Ok, that took me a few minutes and my copy of the pickaxe, but I got it. Very nice. The only thing I don’t get is why there’s a rescue in the m Proc. I thought if merge found two elements the same it would overwrite the old with the new, not raise.

said on 26 Jan 2006 at 19:55

Why’s page has been squarely aimed at Ruby experts for all of this year and most of last in my opinion. I had to read the column for over a year (and practice Ruby independently) before I was ready to seriously start understanding the examples like Dwemthy. The fact of the matter is, anybody who doesn’t like Ruby won’t learn it, and if you don’t learn it you won’t have to debug it because you won’t be doing it. simple!

said on 26 Jan 2006 at 20:11

cilibrar: _why’s hijinx definitely have a very challenging spirit, i know it’s encouraged me to beef up and understand more and more… co-workers at a big meeting yesterday got a big kick (and nodded in agreement) with my use of the words “kung-fu” and “zen” in reference to proper design ;)

said on 27 Jan 2006 at 02:57

THBMan

foo=1&bar=2 : no duplicate keys, proc no fire, merges with no issue

foo[bar]=1&foo[baz]=2 : duplicate outer keys, merge asks proc to fire, bar and baz are inner unique keys (if they weren’t, recursive proc call), no rescue, merges like above

foo=1&foo=2 : duplicate key, proc fires, 1 and 2 are not hashes, rescue fires, foo key points to array in the “merged” hash and values accumulate

merge by default overwrites. The proc given to it doesn’t do that.

Or at least that’s my understanding of the _whyjinx.

said on 27 Jan 2006 at 18:26

That seems pretty complicated if the goal really is to accomplish the task in as few bytes as possible.

The following function should fulfill the informal spec about as well as the updated version above, and has only 122 non-whitespace characters compared to 177 in the original.


def qs_parse(qs)
  h = s = {}
  $3?(s[$1], s = s[$1] ? s[$1].to_a << $3 : $3, h):s = s[$1] ||= {} while qs.gsub!(/^\[?(\w+)\]?(=([^&;]*).?)?/, '')
  h
end

(Sorry if I missed any obvious golf tricks; this is my first Ruby program).

said on 27 Jan 2006 at 19:43

Juho: What is the “informal spec” that you are referring to? There are query strings for which your function doesn’t appear to work properly (as well as other cases for which the updated version above still doesn’t seem to work properly) but I am not clear whether such strings are supposed to permitted, so I’d like to see the spec.

said on 27 Jan 2006 at 22:34

Sorry, by “informal spec” I pretty much meant this blog post and the comments, not an actual specification.

The examples specify the intended basic behaviour. The comments suggest that strings like foo[bar][baz]=1 should be handled properly. And the lack of any error detection (e.g. checking that the brackets are balanced) in the original means that such things matter less than making the code short ;-)

said on 28 Jan 2006 at 00:07

Juho, you wild man, well played. There’s no rules here, just unfenced sporting arena with starving, sickly grass. I’m with you on the error detection, let’s just parse it into something.

said on 28 Jan 2006 at 00:26

Oh, Juho. That qs_parse is going to need a dup in there, because the string passed in gets destroyed by the gsub!.

Also, mine does parse camp[]=Dooley&camp[]=Rheinhart into {"camp" => ["Dooley", "Rheinhart"]}, which does right with the Rails/PHP rules.

said on 28 Jan 2006 at 00:32

Juho: Thanks for the clarification. What I was wondering about wasn’t such errors, but whether strings like foo=1&foo[bar]=2&foo[baz]=3 should be handled. I would expect that to return something like {“foo”=>[“1”, {“bar”=>“2”}, {“baz”=>“3”}]} as the original does but yours raises an error. However, the original doesn’t correctly handle similar strings such as foo[bar]=2&foo[baz]=3&foo=1, i.e., the same string but with the parameters simply in a different order. I had modified the original so I think it handles such cases like the above, but was wondering how they should be treated.

said on 28 Jan 2006 at 00:54
By the way, the original parses strings like these:
 
  camp[]=&camp[]=Rheinhart
  camp[]=Dooley&camp[]=
 
into
 
  {"camp"=>["Rheinhart"]}
  {"camp"=>["Dooley", ""]}
 
Again, I’m not sure how they should be handled, but the second parsing seems to make sense (and the same change I mentioned above also addressed situations like the above in a seemingly consistent way). What’s the desired behavior?
said on 28 Jan 2006 at 01:10

Can someone please tell me where I can find “the Rails/PHP rules” for handling these types of query strings? It sounds like that will answer the questions I had above. Thanks.

said on 28 Jan 2006 at 13:06

I had a situation where I wanted to read from a text file that had on each line something like “foo:bar”. Obviously, in my Ruby script I want to get a hash representation. Not very elegant, I’d guess, but anyway, this is what I did:

hsh = Hash.new
File.open("test.txt").readlines.each { |line|
        hsh = hsh.update(Hash[ *line.split(":").
                map { |ele| ele = ele.match(/(.*)\n/)[1] rescue ele }
        ])
}
said on 28 Jan 2006 at 15:00

How about using scan: hash = {}; qs.scan(/(name|hair)=(.*?)(&|$)/) { hash.update($1 => $2)}; puts hash.inspect

said on 28 Jan 2006 at 18:24

Premshree: That’s very similar to this bit I saw on IRC a great while ago.

 Hash[*IO.read('test.txt').scan(/^(.+?):(.*)/).flatten]
said on 29 Jan 2006 at 11:00

For yet another qs.scan solution check out bigbold.com/snippets/user/ntk !

Comments are closed for this entry.