hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

Hpricot and Sandbox for Win32 #

by why in inspect

Mauricio checked in some Rakefiles for cross-compiling to win32, so I’ve got some win32 gems for Hpricot and the (FF)Sandbox. The majority of you can now:

 gem install hpricot --source code.whytheluckystiff.net
 gem install sandbox --source code.whytheluckystiff.net

THE SANDBOX ONE IS COMPILED FOR 1.8.4 SO IT HAS TOTALLY GOT HOLES. And yet, fun can still be had I’m sure.

As for cross-compiling, I’m using mingw on FreeBSD. Here’s how:

 cd /usr/ports/devel/mingw32-gcc
 sudo make install

 cd ~/sand
 wget http://ftp.ruby-lang.org/pub/ruby/binaries/mingw/1.8/ruby-1.8.4-i386-mingw32.tar.gz
 mv usr/local RUBY-1_8_4-MINGW32
 rm -rf usr

 cd ~/dev
 svn co https://code.whytheluckystiff.net/svn/sandbox/trunk sandbox
 cd sandbox
 export MINGW32_RUBY=/home/why/sand/RUBY-1_8_4-MINGW32
 export MINGW32_PREFIX=mingw32
 rake rubygems_win32

I mean that’s alot better than putting together a VM and trying to track down a decent free Microsoft compiler. A couple weeks ago, I spent six hours on it and made no progress.

said on 21 Jul 2006 at 15:04

Yay! Hpricot totally rocks the Windows. Thanks for the speedy work on that!

said on 21 Jul 2006 at 15:10
Is this what you mean by VM?

VM/370 ONLINE
            VV        VV    MM        MM
            VV        VV    MMM      MMM
            VV        VV    MMMM    MMMM
            VV        VV    MM MM  MM MM
     3333333333     777777777777MMMM  00000000
   333333333333    77777777777  MM  0000000000
    33      VV33    77VV    77      00MM      00
             V33     VV    77M      00MM      00
              33    VV    77MM      00MM      00
           3333VV  VV    77 MM      00MM      00
           3333 VVVV     77 MM      00MM      00
              33 VV      77 MM      00MM      00
              33         77         00        00
    33        33         77         00        00
    333333333333         77          0000000000
     3333333333          77           00000000
said on 21 Jul 2006 at 15:35

I mean that’s alot better than putting together a VM and trying to track down a decent free Microsoft compiler. A couple weeks ago, I spent six hours on it and made no progress.

Is there something wrong with Microsoft’s own C++ compiler (on Windows, that is, of course)? The toolkit is free as in beer.

said on 21 Jul 2006 at 15:41

These win32 binaries provided by the puny MinGW toolchain, which doesn’t stand comparison with the almighty MS VC2005 Express, but empowers non-win32 developers to build binaries for their win32 users.

More details about the cross-compilation magic in the Rakefile, and here .

said on 21 Jul 2006 at 15:46

Asztal: First of all, it runs on win32 only :-) Second, and most importantly: it’s not binary-compatible with the VC6 ruby build (I read now that MinGW isn’t fully either, but it works for all but a few extensions).

said on 21 Jul 2006 at 16:03

Does Hpricot not support [] in an Xpath? I know I can index the elements from the resultant ruby array following a search, but I have to believe it could be faster using an internal lookup to an internal C data structure via an Xpath string.

said on 21 Jul 2006 at 17:13
OK, even though Hpricot isn’t an XML parser, it seems to handle RSS well enough, but feedburner’s atom feeds forces this exception…
irb(main):008:0> Hpricot(URI.parse("http://feeds.feedburner.com/vedana").read)
/usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:164:in `build_node': 
[bug] unknown structure: 
[:xmlprocins, "href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css\" 
type=\"text/css\" media=\"screen", nil, nil] (Exception)
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:59:in `make' 
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:59:in `make'
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:11:in `parse'
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:4:in `Hpricot'
 from (irb):8:in `irb_binding'
 from /usr/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding' 
 from /usr/lib/ruby/1.8/irb/workspace.rb:52
said on 21 Jul 2006 at 17:49

hoyhoy: Thankyou for that. Is fixed in SVN , the feed parses fine.

The thing is: I’m not keeping any internal C data structures here. It’s usually the parsing that really slows you down, not the data structures. It would be faster to do it all in C, but then it would be something that I couldn’t possibly finish.

said on 21 Jul 2006 at 17:58

That makes supporting [] in XPath kind of moot. I’ve often considered making my own whizbang XML parser gem using flex/bison or spirit. Unfortunately, after having that kind of thought, I usually sit down and eat a sandwich and it goes away.

said on 21 Jul 2006 at 18:54

Haha, oh yeah dreaming about scanners and parsers is really fun, it’s too bad the reality is so vomitously bleak.

said on 22 Jul 2006 at 02:08

Well, maybe we should make that not so. :) (HAH)

said on 22 Jul 2006 at 02:32

I am the proud owner of a 2006 Win32 Sandbox, Unsafe Edition, with leather seats.

said on 22 Jul 2006 at 15:51
More XML weirdness:

irb(main):035:0> Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("item")[1].to_html.each_line { |l| puts l if l.match(/dc:date/) }; nil<dc:date>2006-07-22T19:29:00+00:00</dc:date>=> nilirb(main):036:0> puts Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("dc:date")[1]nil=> nil
said on 22 Jul 2006 at 15:53
Terminal.app et me newlines.
irb(main):035:0> Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("item")[1].to_html.each_line
 { |l| puts l if l.match(/dc:date/) }; nil
<dc:date>2006-07-22T19:29:00+00:00</dc:date>
=> nil
irb(main):036:0> puts Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("dc:date")[1]
nil
=> nil
said on 22 Jul 2006 at 20:51
I added a colon to this match on line 102, and then search will find tags with colons.

m = expr.match %r!^([#.]?)([a-z0-9\\*_:-]*)!i
said on 22 Jul 2006 at 20:52

Line 102 in traverse.rb, but you knew that.

said on 24 Jul 2006 at 08:56

... it’s not binary-compatible with the VC6 ruby build…

Dear God, why do people still use VC6 ?

said on 24 Jul 2006 at 10:34

Capn: Because it’s become a de facto standard, and tends to have good windows compatibility.

said on 24 Jul 2006 at 11:25

hoyhoy: You have a sneaky smiley man in your regexp! :-]

said on 25 Jul 2006 at 06:36

this is great! but i have a little problem: is v0.3 missing something? Container::Trav.filter calls to_node.subst_subnode on line 402, but that doesn’t seem to exist?

said on 25 Jul 2006 at 10:12

FlashHater: However, VC7 has the same perfect level of windows compatibility AND is one of the most standards compliant compilers on the market. Essentially, with the availability of VC7 , there is no reason to use VC6 .

said on 01 Aug 2006 at 10:28

So am I using hpricot wrong or is this a bug?

doc = Hpricot(open(“http://usgenweb.org/”)) puts (doc/:a).length # => 15 but it should be 66

also the follow errors out

doc.to_s

said on 03 Aug 2006 at 12:52

Hpricot doesn’t support this:

doc.search("//table/tr/td[3]")

... does it?

I’m trying to fetch the 3rd td in each tr.

said on 04 Aug 2006 at 08:03

bearik, this should work

doc.search("//table/tr/td")[3]
said on 05 Aug 2006 at 09:38
bearik, I missed the word “each” in your post, so my previous recipe would not work. I think the only way is to use nested searches. Something, on the lines of:
doc.search("//table/tr).each do {|x|
  x.search(/td/)[2]
}
said on 07 Aug 2006 at 10:29
 doc.search("table tr td:nth(3)")

There’s still some unimplemented XPath. Refer to the supported CSS selectors for other ideas.

11 Jul 2010 at 21:31

* do fancy stuff in your comment.

PREVIEW PANE