Hpricot Strikes Back
My my. How the sensationalist press does carry on.
Peter Cooper: On an Hpricot vs Nokogiri benchmark, Nokogiri clocked in at 7 times faster at initially loading an XML document, 5 times faster at searching for content based on an XPath, and 1.62 times faster at searching for content via a CSS-based search. These are impressive results, since Hpricot was previously considered to be quite speedy itself.
I feel just awful (just supreeeeemely lousy) that these benchmarks were only good for four days. Nokogiri is no longer seven times faster than Hpricot.
And that means these guys have to go back through all their docs and promotional materials and… wow, what a job it’s going be. It’s just a tough situation, folks. My heart goes out to all the fine young lads who worked so hard to bring Hpricot down, only to discover, hey, boss, there she goes! Hpricot is strolling right along the boardwalk, smiling, waving, checking its watch, fit as a fiddle.
This fruit is tiny, shiny and can be spit-polished in a single weekend.
Before we get to the news… Here’s the XML those Nokogiri benchmarks are based on: sin.xml.
Someone help me here. Am I reading that right? There are six XML tags in that whole file. Is this for reals? Six, right?
<location>
<refUrl>http://wikitravel.org/en/Singapore</refUrl>
<info>
<b>Singapore</b> is an island-state in Southeast
Asia, connected by bridges to Malaysia. Founded as a British trading colony
in 1819, since independence it has become one of the <b>world's
most prosperous countries</b> and sports the world's busiest
port. Combining the skyscrapers and subways of a <b>modern,
affluent city</b> with a medley of Chinese, Indian and Malay
influences and a <b>tropical climate</b>, with
tasty food, good shopping and a vibrant nightlife scene, this Garden City
makes a great stopover or springboard into the region.
</info>
</location>
This benchmark was linked all over the place last week. Does anyone look at this stuff?
Okay, great. Let’s battle!
Time for a new benchmark based on timeline.xml
from John Nunemaker’s libxml vs. hpricot stuff.
user system total real hpricot:doc 2.630000 0.030000 2.660000 ( 2.655527) hpricot2:doc 0.340000 0.000000 0.340000 ( 0.349340) nokogiri:doc 0.600000 0.020000 0.620000 ( 0.611570) user system total real hpricot:xpath 1.910000 0.000000 1.910000 ( 1.911496) hpricot2:xpath 0.890000 0.010000 0.900000 ( 0.897664) nokogiri:xpath 0.060000 0.000000 0.060000 ( 0.061546) user system total real hpricot:css 1.880000 0.000000 1.880000 ( 1.889301) hpricot2:css 0.680000 0.010000 0.690000 ( 0.677072) nokogiri:cssbenchmark.rb:77: [BUG] Bus Error ruby 1.8.6 (2007-09-24) [i686-darwin8.11.1]
Try it yourself with the hpricot-0.6.170 gem, which includes source code, so you’ll need a compiler.
This is only a rewrite of the parser, not the Ragel lexer. I’m actually surprised the XPath and CSS parser numbers are cut in half, basically, just by changing my object structures. I’ve just finished a new Ragel-based CSS selector parser which should cause those searches to drop dramatically. I am considering dropping XPath support this time.
I haven’t got the new parser totally switched in yet. Right now you call Hpricot.scan
without a block. Once I’m finished testing the two side-by-side, I’ll swap in the new parser and release 0.7.
I feel some regret posting a benchmark at all, because I don’t want to detract from my main point.
Someday, Nokogiri may be seven times faster than Hpricot. Someday it may be twelve times slower. In fact, on one single day, it may be five time faster, then fourteen times slower, then eleven-point-three times faster!
But Nokogiri has no fuzzy fruited emblem. And it does not dwell in an orchard of markup. (Such a very yummy orchard, you’d never believe!) I can put those statements in my promos and newsreels and they’ll never change.
Now begin the comments …
jontyjont
Well said _Why!!
I think perhaps that some people are unripe fruit coloured at your excellence!
Alistair Holt
Hooray for Hpricot!
Matt Aimonetti
Great stuff, I’m actually pretty happy that nokogiri and hpricot are both trying to improve perfs. I know of people saying that the Ruby community doesn’t have an awesome parser like python’s. I’m pretty happy to see that things are improving!
Thanks _why for your valuable contribution.
-Matt
Aaron Patterson
If my only contribution is that I motivate you to update Hpricot, then I have achieved my goal.
Thanks.
_why
Aaron Patterson:
Wait a minute. But I am only working on Hpricot to put some pressure on Nokogiri and LibXML. So they can live up to their potential. So that you can grow as a person.I actually think that, for xml, Nokogiri is going to school Hpricot’s sorry ass-shaped apricot crease. I can’t turn off my flexible parsing mode, it’s built-in. I can’t do checks for well-formedness, nor am I able to parse utf-16 and utf-32. And, like I said, I’ll probably drop XPath support since JQuery did the same.
Aaron Patterson
_why: Thanks for the motivation! I can feel the pressure! urnnghhhh! ack!
defunkt
Hpricot is dead, long live Hpricot!
doki_pen
Sowwy _why. To hew wit pawsing speeds. He wins on haiw alone.
Jon
I just want to thank Aaron Patterson, _why, and the LibXML-Ruby team for putting together these great, open source libraries. It’s so amazing to have choices and seeing you guys are constantly upping the ante, that I just may have to start committing code to y’all
leethal
The Return of the Epic and Forgotten Hpricot Library.
Peter Cooper
I’m proud to be considered sensationalist by the king of sensationalism himself. If it gets people talking, thinking, and doing, it’s a very rewarding thing to stir the pot as you know yourself!
_why
Get your grimey, upskirt-hungry camera lenses out of here, Peter Coooper!! And take your no-good besmirched and libel-stained microphones with you. Humpf.
Dr Nic
I wish hpricot could render its bountiful logo on my terminal after I had finished install it.
Dr Nic
I had to change line 9 to
to avoid this error
_why
And tell your goons in the Ruby Inside shirts to unhand me. I’m very delicate.
Peter Cooper
You can tell I’ve been taking Rupert Murdoch’s correspondence course by tape! Anyway, stop giving me ideas – I’m now trying to think of how I can work the word “upskirt” into my next post.
Peter Cooper
I’ve updated the Ruby Inside post to recognize the new greatness that is Hpricot.
Alex Pooley
Noooo.. whyyyyy.. people will die and children will bleed when you drop xpath support.
How else will I so elegantly extract values of nodes and attributes with just CSS selectors?
Think of the children why. The children!
Dr Nic
A quick search of this post finds that the first reference to “upskirt” is in your own comment… what were you looking at just before you commented here? :)
Dr Nic
Actually I wish that was true… I suck at in-line browser searching apparently.
collintmiller
here here!
Glad to hear about giving xpath the boot.
Never used it. And project managers go apesh*t for xpath, seemingly ignorant that we can reuse our css skillset. And just use json on the wire…
Curses
Senator the Unicorn
Removing X-Path will help a great many PTSD sufferers in dire need of a framework which doesn’t bring back terrible memories of a past where they were forced to create XSLT documents all day long.
The community thanks you ever so muchly for your consideration and kind heartedness towards these oft-forgotten tech veterans.
Chu Yeow
Just a bit of trivia: that Nokogiri vs. Hpricot benchmark was mine and it was a specific revision of Nokogiri vs. a specific revision of Hpricot that we were using on our own XML API (the http://static.bezurk.com/fragments/wikitravel/sin.xml file that had like 6 XML tags), it was never intended to be a comprehensive benchmark :) It sure convinced us to use Nokogiri at this time though since it benchmarked actual code in my own application (and thus was an entirely practical benchmark for my own purposes of deciding whether to switch).
Anyway, I’m liking the competition if it means nicer and faster code all around!
rick
When can we get one of these in the stdlib to replace rexml?
Alastair Brunton
Top class why, you are my hero!
hosiawak
rick: why would you want to replace rexml? Can you parse a large xml file as a stream using Hpricot or Nokogiri ?
PeeDee
Benchmarks. My benchmark:
It’s the elegance and ease of use that matters to me, _why. Thanks.
Peter Szinek
Dropping XPath support would suck – for example scRUBYt! is relying solely on Hpricot XPath support, and I somewhat doubt it’s the only gem depending on Hpricot using XPaths (?).
topfunky
Such drama!
I especially liked the part where nokogiri threw a Bus Error. I did not expect that.
anildigital
seriously nice drama! I hope competition continues!
SeanJA
Throwing buses is never the answer
stepheneb
I like the xpath support in hpricot — I use it for transmogrifying xml documents … I also like that I can use it in jruby.
HULK
HULK tell SeanJA: Throwing buses ALWAYS good answer!
RubyPanther
I wouldn’t recommend actually using the xpath support in a real project, but I think it’s really useful to have it there for when you want to get down and dirty, or when you’re stuck sharecropping, or even debugging.
Not to mention that I for one am usually pleased by backwards compatibility.
Lawrence Pit
Nokogiri released Oct 31st. Nov 7th it slipped into webrat which previously used hpricot. Nov 8th Merb releases it’s v1.0, which requires webrat, and hence nokogiri.
I didn’t run into a bus btw :
Gem used is nokogiri-1.0.3.
Anyways, I’ll always love those fruity markup!
ryan_a
Nnokogirl threw a bus? Them’s some big muscle arms!!
huh?
Yes I’m new and only understand half the things said here. But let me be so bold as to ask…what is wrong with XPath? We use it all the time with Hpricot and it works well.
lzell
Another vote for keeping XPath support. What is motivating you to drop it?
Anko Painting
Hey Why,
what’s going on with the hpricot bug tracker? Any plans to support 1.9.1? :)
why what the heck happened to hpricot?
I googled hpricot…
Clicked on the first result…
https://code.whytheluckystiff.net/hpricot/
Failed to Connect…
Comments are closed for this entry.