back to _why's Estate


What's Shiny and New in Ruby 1.8.0?

A massive handshake to Yukihiro Matsumoto, who has completed version 1.8.0 of the agile Ruby language. This work has been in progress for several years now, since Matz forked the 1.7 branch in mid-2001.

Many of 1.8.0's new features have been long awaited by the Ruby crowd. I'm just going to run through a few of my favorite new features that I've longed to have in a stable Ruby version. I'm not going to include all of 1.8.0's features, only the ones that have stuck out to me. If I make a terrible omission, please let me know. I'd love to build on this.

(Incidentally, a really nicely formatted write-up of changes has been done by Michael Granger.)

Core Changes

First, I'll detail a number of changes that affect the core set of classes that Ruby operates upon. Matz has spent a lot of time reviewing and cleaning these classes in detail. For example, Matz' work on cleaning up the Block/Proc classes took him several months to decide upon and implement. I can tell you that he is really thinking through each small change made in Ruby.

The allocate method

Scanning through RAA, you will find many libraries which use a variety of strange techniques for creating objects without calling class constructors. The ability to build a blank class instance is incredibly useful. Perhaps you're loading data from a database and you want to populate classes based on their property set. Perhaps you're writing a serialization library. Perhaps you want to copy only certain parts of an object. Ultimately, you want control over the allocation and initialization of an object.

Historically, SOAP4R has used Marshal.load to create a blank object. This technique works well, but involves some sketchy logic to assemble a marshalled string.

Providing that our class name is a String in variable class_name and a Hash of properties and their values is stored in class_props:

 msh = "\004\006o:%c%s\000" % [ class_name.length + 5, class_name ]
 o = ::Marshal.load( msh )
 class_props.each_pair { |k,v| o.instance_eval "@#{k} = v" }

Another common way of dealing with this is found in Chris Morris' clxmlserial library. This approach involves shorting the constructor by temporarily alias ing it.

Here's an allocate technique which leverages alias:

 class Class
   def allocate

     self.class_eval %{
       alias :old_initialize_with_args :initialize
       def initialize; end
     }

     begin
       result = self.new
     ensure
       self.class_eval %{
         undef :initialize
         alias :initialize :old_initialize_with_args
       }
     end

     result
   end
 end

While both are neat bits of code, they are both obviously circumvention hacks. The allocate method has been added to allow proper bypassing of the constructor.

 o = Object::const_get( class_name ).allocate
 class_props.each_pair { |k,v| o.instance_eval "@#{k} = v" }

This method also comes with an accompanying API call for extensions:

 VALUE rb_obj_alloc( VALUE klass )

Duck typing and to_str

If there is a primary thrust to Ruby 1.8.0 (and the releases building up to it), it is duck typing. We differentiate our objects by the methods that they possess. You should see more and more respond_to? in Ruby code rather than kind_of?. This concept is a core mantra of Ruby, right alongside the principle of least surprise (POLS).

Whereas the phrase "duck typing" doesn't appear in 1.6.x era versions of the PickAxe, you can see it popping up all over ruby-talk: [53644], [56614], [76059]. (You might also check out Types in Ruby.)

Let's take a bit of code we might have used previously in Ruby:

 # Read entire contents of a file
 def read_file( file_name )

   # Ok. It's a file name.
   if file_name.is_a? String
     File.open( file_name ).read

   # But what if it's an IO object?
   # Let's read from it!
   elsif file_name.is_a? IO
     file_name.read

   end
 end

The above code is trying to abstract away the reading of data by handling read operations inside the method. Since we're so used to Java and Python techniques, we tend to identify an object based on what classes it descends from. We look at the file_name var and figure that if we check for descent from a String class, then we are covered if someday we decide to extend String class on our own and use that as our file_name.

With to_str, Matz is giving us a simpler way of demonstrating that our classes can be used as strings directly. Nearly all builtin methods use to_str to determine if an object is (or can be used as) a String. Think of it: if you extend the String class, then you have to write alternate methods (sub!, length, append, etc.) tailored to your needs.

Instead, simply write a to_str method and we can treat such an object like a String in our example:

 def read_file( file_name )
   if file_name.respond_to? :to_str
     File.open( file_name ).read
   elsif file_name.respond_to? :read
     file_name.read
   end
 end

So why not use to_s? Because to_s coerces objects into strings. So, to_str is an implicit cast, whereas to_s is an explicit cast.

Think of timestamps. We want to be able to easily convert a timestamp into a String for printing:

 puts "Time.now: " + Time.now.to_s
 #=> Time.now: Mon Aug 04 13:37:43 MDT 2003

But we don't want to load a File based on a timestamp:

 File.open( Time.now )
 #=> TypeError: cannot convert Time into String
     from (irb):3:in `initialize'
     from (irb):3:in `open'
     from (irb):3

Generally, we just don't need a timestamp to act as a String. So we use to_s to explicitly convert it.

If you're having a hard time remembering which is which, I would remember that there is a reason that to_s is shorter. First, it implies that the object isn't really much of a string, so we're only using the first letter 's'. Also, to_s is shorter because more objects will have to_s methods, so you'll end up typing it more frequently.

With to_str, we're tagging an object as much closer to being a string, so we give it the first three letters. It's almost half of a string!

[RCR:41] Class.new and Module.new each take a block

If a block is passed to Class.new or Module.new, the block is executed within the context of the class or module. This is great for creating anonymous classes and modules without needing to call eval. Entire anonymous classes and modules can now be created with a syntax that isn't far off from the normal class and module declarations.

 m = Module.new do
       def test_me
        "called <module>::test_me" 
       end
     end

 class NewTest; end
 NewTest.extend m

 NewTest.test_me
 #=> "called <module>::test_me" 

As for creating anonymous classes:

 c = Class.new do
       def test_me
         "called <klass>::test_me" 
       end
     end

 c1 = c.new
 c1.test_me
 #=> "called <klass>::test_me" 

Fully qualified names (Foo::Bar)

Here's a small but significant change. Ruby 1.8 now allows you to declare classes and other constants using the full path to the constant. Previously, you had to surround such declarations with the module declaration. They also has to be nested for each module declaration.

Here's a declaration for the Foo::Bar class:

 module Foo
   class Bar; end
 end

Modules must still be declared in 1.8 as shown above. But we can now add methods to the Foo::Bar class without nesting the method declaration in a module declaration:

 class Foo::Bar
   def baz; end
 end

The old syntax is still valid and acceptable:

 module Foo
   class Bar
     def baz; end
   end
 end

Proc subtelties

I will mention a few changes to Proc, simply because it's the such a useful construct and slight changes in behavior help indicate its future.

With respect to return and break, you used to treat a proc (or lambda) the same as a block. Let's examine the following code:

 def proc_test( num )
   3.times do |i|
     return i if num == i
   end
   return 0
 end
 proc_test( 2 )
 #=> 2

In the above, we have a return inside the iterating block. It makes perfect sense for a block to return from the caller. Blocks have fairly transparent scoping. (By the way, if you haven't noticed, everyone has their own ideas about how block scoping should work.)

In the case of a proc (or lambda), Ruby is beginning to protect their scope more than with a block. In Ruby 1.8.0, both break and return exit the scope of the proc, but do not exit the scope of the caller.

 def proc_test( num )
   p = proc do |i|
         return i if num == i
       end
   3.times do |x|
     p.call( x )
   end
   return 0
 end
 proc_test( 2 )
 #=> 0 

In 1.6.8, proc_test( 2 ) would return 2. You can see how the proc is becoming less like a block and more like an anonymous method.

LocalJumpError: return from proc-closure

I'd also like to mention a fix that was made in 1.6.8, but still bites me from time to time. I'm sure many of you will encounter this, not sure what to make of it.

Suppose we have an event handling system in our GUI system. A system for handling mouse clicks. We have a method (simulate_click) which we can use to test calling all the click handlers and a method (add_click_handler) for introducing new handlers. We'll also add a click handler which catches the the click if it leaves the upper corner of the screen and prevents the event from bubbling.

 def simulate_click( x, y )
   @click_handlers ||= []
   @click_handlers.each do |h|
     return false unless h.call( x, y )
   end
   return true
 end

 def add_click_handler( &block )
   @click_handlers ||= []
   @click_handlers << block
 end

 add_click_handler do |x, y|
   return ( x > 25 && y > 25 )
 end 

Looks harmless? Well, this code is just dying to break. Any call to simulate_click will throw a LocalJumpError.

 simulate_click( 60, 70 )
 #=> LocalJumpError: return from proc-closure
     from (irb):78:in `call'
     from (irb):78:in `simulate_click'
     from (irb):77:in `each'
     from (irb):77:in `simulate_click'
     from (irb):81

Our problem is that we're dealing with an orphaned block. Several situations can create an orphaned block, but the most common is to receive a block through a method call, assign it to a variable and use it outside of the original scope. Also, if a block crosses to another thread. It becomes difficult to tell how that return was intended. (Also, avoid using break or retry directly inside of an orphaned block.)

So how do we fix our script? Get rid of return!

 add_click_handler do |x, y|
   x > 25 && y > 25
 end 

Alternatively, pass in a Proc to give the return some context. (Note that just performing a to_proc conversion on an orphaned block won't do the trick.)

The moral of the story is: think about what you're doing when you use return, break or retry in your code. For iterators, don't use the above technique. Rather, use block_given? and yield, which prevent the block from becoming orphaned.

Builtin Class Changes

Now, let's cover some new methods found in Ruby's core classes. Pay special attention to the additions in the Array class. Many of those methods will become a crucial part of your development, should you learn them.

Object#initialize_copy

Don't get the wrong idea about Ruby's new allocate method (described above). No one's picking on constructors. Constructors have a solid spot in Ruby's future. In fact, a new constructor has been added to every Object.

In Ruby, the initialize method is called when an object is created with new:

 class Person
   attr_accessor :name, :company, :phone, :created_at
   def initialize( name, company, phone )
     @name, @company, @phone, @created_at = 
      name, company, phone, Time.now
   end
 end

 bill = Person.new( 'Bill Bobson', 
   'The Mews at Windsor Heights', '801-404-1200' )
 bill.created_at
 #=> Tue Aug 05 14:09:52 MDT 2003

However, when an Object is copied with clone or dup, then constructor is skipped. This thwarts our timestamp mechanism above, though. What if we want to copy the basic data for the office, but reset the creation date and leave the name blank?

The initialize_copy constructor gets called on a clone or dup. Ruby passes in the object to copy from. We can pick and choose what data we want to keep. Initialize some data on our own.

So, rather than starting with a duplicate and stripping out data, we assemble a blank object with the pieces we're copying.

 class Person
   def initialize_copy( from )
      @company, @phone, @created_at = 
        from.company, from.phone, Time.now
   end
 end

 carol = bill.dup
 carol.name = 'Carol Sonbob'
 carol.created_at
 #=> Tue Aug 05 14:10:24 MDT 2003

Enumerable#inject: Building while iterating

This method is easily the most anticipated addition to the core classes. Inject allows you to introduce a single value into the scope of an iterating block. Each value returned by the block is introduced on the successive call.

We'll start with a simple example:

 [1, 2, 3].inject( "counting: " ) { |str, item| str + item.to_s }
 #=> "counting: 123" 

See if you can figure out how the above works without my explanation. Ask yourself: how does the string get passed into the block? How do the numbers get added to the string? And how does the block return the full string?

With inject, we supply the method with a value which will accompany us as we iterate through the block. This injected value is passed into the block as the first parameter. (In the above: str) The second parameter is a value from the object we're iterating through.

The injected value is really only used on the first pass through the iterator. At the end of the first pass, inject keeps the return value of the block and injects it into the block on the second pass. The return of the second pass is injected into the third pass, and so on.

To be clear, let's inspect the block parameters on our inject call:

 [1, 2, 3].inject( "counting: " ) do |str, item| 
   puts [str, item].inspect
   str + item.to_s
 end

 # ["counting: ", 1]
 # ["counting: 1", 2]
 # ["counting: 12", 3]
 #=> "counting: 123" 

You can see the string building. The evolution of the injected value. The inject method is great if you are building a single value from the contents of any Enumerable type. Its uses are numerous and it has been said that inject could replace using most of Enumerable's other methods.

Enumerable#sort_by: Faster, simpler sorting

The existing Enumerable#sort method leverages a block to sort value. Two values are handed to the block and the block must compare the two items. This process can be expensive, as in the following example:

 ["here", "are", "test", "strings"].sort { |a,b| a.hash <=> b.hash }
 #=> ["test", "strings", "here", "are"]

In the above block, a number of unnecessary hashes are generated. This is evidenced if we print out the hashes as they are generated:

 ["here", "are", "test", "strings"].sort do |a,b| 
   puts ah = a.hash
   puts bh = b.hash
   ah <=> bh
 end

 # 431661103
 # -914358341
 # -914358341
 # -890696794
 # 431661103
 # -890696794
 # 834120758
 # -890696794
 # 834120758
 # 431661103
 #=> ["test", "strings", "here", "are"]

Ten hashes in all are generated by sort as it works to compare these values against each other. In addition, Enumerable#sort can be difficult to master as the return value must be -1, 0, or 1, each signifying greater than, equal to and less than. I'm sure that a certain two of those are frequently confused by newcomers.

The Enumerable#sort_by performs the now-infamous Schwartzian transform by simply asking you to generate values which can be used for sorting. In the above case, we really just want to use a string's hash for sorting, so let's return the hash to sort_by, which can do the rest of the work for us.

 ["here", "are", "test", "strings"].sort_by { |a| a.hash }
 #=> ["test", "strings", "here", "are"]

Much more compact. Much more efficient. Nothing fancy to remember.

Using sort_by with objects

Harry Ohlsen has contributed a neat bit of code, demonstrating how sort_by can be used to sort objects elegantly. This code is so simple and readable that I had to include it here for your affections.

Assuming the Person class introduced above in the initialize_copy section:

 persons = [
   Person.new( 'Roger Andies', 'IBM', '456-101-2345' ),
   Person.new( 'Bill Bobson', 'Carl's Jr.', '608-121-0001' ),
   Person.new( 'Bill Bobson', 
     'The Mews at Windsor Heights', '466-404-1200' ),
   Person.new( 'Harvey Winston', 'ARUP Labs', '707-255-1212' )
 ]

We can then sort this Array of Objects by providing sort_by with a list of the properties to sort by, in order of precedence:

 persons.sort_by { |p| [p.name, p.company, p.created_at] }

Amazingly simple! The above code will return a list of Person objects, sorted first by name, then by company, then by creation date. So in the case of dueling Bill Bobson's, the Bill Bobson with the alphabetically early company name will prevail.

Enumerable#any?

The any? method checks an Enumerable to see if any of its values can meet a comparison. This comparison is contained within a block.

To see if any members of an Array meet a regular expression:

 ["Mr. Janus", "Mr. Telly", "Ms. Walters"].any? { |x| x =~ /^Ms\./ }
 #=> true

Enumerable#all?

The all? method checks an Enumerable to see if all of its values can meet a comparsion. Like any?, the comparison is expressed by a block.

To see if all members of an Array meet a regular expression:

 ["Mr. Janus", "Mr. Telly", "Ms. Walters"].all? { |x| x =~ /^Ms\./ }
 #=> false

[RCR:16] Array#partition

Here's a great method for sorting data. You can almost think of this as an expansion of Array#reject which returns separate Arrays for both the accepted and rejected data.

For example, in my YAML testing suite, I execute about 150 tests and results are returned to me in the form of an Array of Hashes. Each hash has a success key, which indicates the tests that pass and fail.

 # Load my test results
 tests = YAML::load( `ruby yts.rb` )

 # Separate tests into successes and fails
 success, fail = tests.partition { |t| t['success'] }

I now have a list of all successful and failing tests. This is exactly the code that I'll be using to generate HTML results for my tests.

Array#transpose

The new transpose method basically reverses the dimensions of a two-dimensional Array. Given an Array a1 and its transposed counterpart Array a2: a1[0][1] becomes a2[1][0], a1[1][0] becomes a2[0][1] and a1[0][0] is a2[0][0].

 # A simple two-dimensional array
 [[1,2,3],[3,4,5]].transpose
 #=> [[1, 3], [2, 4], [3, 5]]

 # A more complex three-dimensional array
 [
  [[1, 2, 3], [:a, :b, :c]], 
  [[4, 5, 6], [:d, :e, :f]], 
  [[7, 8, 9], [:g, :h, :i]]
 ].transpose
 #=> [[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 
    [[:a, :b, :c], [:d, :e, :f], [:g, :h, :i]]]

Array#zip

Often Arrays are merged horizontally with means such as Array#concat. The concat method appends one Array onto another:

 [1, 2].concat( [3, 4] )
 #=> [1, 2, 3, 4]

With zip, the Arrays are merged side-by-side. This can be thought of as merging Arrays vertically, to give a new Array with an added dimension.

 [1, 2].zip( [3, 4] )
 #=> [[1, 3], [2, 4]]

Think of two packages of cookies. Dark and light cookies, each in cylindrical plastic wrappers. The cookies are taken out and stacked next to each other on a counter top. This way, if someone is getting ready for a party, they could easily remove the top two cookies (one light, one dark) and place them on a plate.

The zip method is handy for placing Arrays side-by-side in a stack, so sections of these Arrays can be handled together.

Let's say we've developed a machine, a Ruby-powered robotic maid, which can sort through our milk and cookies and create snack plates for us at night. Here's the program we'll execute to give her the complete list:

 milk = [:milk1, :milk2, :milk3]
 light = [:light1, :light2, :light3]
 dark = [:dark1, :dark2, :dark3]

 milk.zip( light, dark )
 #=> [[:milk1, :light1, :dark1],
    [:milk2, :light2, :dark2],
    [:milk3, :light3, :dark3]]

At last snack time is free of the bureaucracy and organizational turmoil that has plagued it for years!

[RCR:132] Hash#merge, Hash#merge!

The Hash#merge method allows you to update a Hash, but merge returns a new Hash. This is great for inheriting pairs from several Hashes.

Say we want to set up a few Hashes with some defaults and create a new Hash with the overiding values from an incoming Hash:

 def make_point_hash( point )
   center = { :x => 1, :y => 2 }
   big = { :r => 10 }
   center.merge( big ).merge( point ) 
 end
 make_point_hash( { :x => 20 } )
 #=> {:y=>2, :r=>10, :x=>20}

Previously, this was done with hsh.dup.update. The Hash#merge! method is a preferable alias for Hash#update, indicating the destructive nature of an update.

[RCR:140,23] Range#step

Ranges are an odd object really. An object that represents many object. Stands here in place of a broad set of numbers so they don't all have to be present for roll call.

The Range#step method adds a lot of extra functionality to the Range class. I venture to say that it will become one of the most highly used Range methods in the world!

In Ruby 1.6, we had stepping with Integers:

 0.step(360, 45) {|angle|
   puts angle
 }

Can you tell which of the above parameters is the limit and which is the step. Sure, it's not too hard. You might say let's start at zero and step to three-sixty with a stride of forty-five. The readability is slightly hindered by the method call coming between the 0 and the 360.

Try Range#step now:

 (0..360).step(45) {|angle|
   puts angle
 }

Which reads from zero to three-sixty let's take steps of forty-five. This is certainly a small change, but it certainly helps to give increased purpose to our core classes.

[RCR:139] MatchData#captures

Ruby has excellent support for regular expressions, but we're still working on giving Ruby it's own angle on them. Matches from a string are returned as MatchData objects, which can be read as an Array.

 text = "name: Jen" 
 matches = /^(\w+): (\w+)$/.match( text )
 # matches[0] = "name: Jen", matches[1] = "name", matches[2] = "Jen" 

Enough regular expressions in your code, you might tire of keeping track of the index for each regular expression group. This RCR mandated MatchData#captures, which returns an array of the captured groups from a match.

 text = "name: Jen" 
 matches = /^(\w+): (\w+)$/.match( text )
 if matches
   key, value = matches.captures
   # key = "name", value = "Jen" 
 end

String#[regexp,n]: A fun Regexp quickie

Frankly, Ruby's Regexp support rules. For example, you can pass a Regexp into a String as if it were an Array index. Ruby will check for a match.

 "cat"[/c/]
 #=> "c" 
 "cat"[/z/]
 #=> nil

With Ruby 1.8.0, you can pass in an optional second argument which will return the content of the nth matching group. How nice!

 re_phone = /(\d{3})-(\d{3})-(\d{4})/
 "986-235-1001"[re_phone, 2]
 #=> "235" 

[RCR:69] Inspect with %p in sprintf

For those who use the sprintf or String#% syntax, you can now print an object inspection with the %p parameter.

 hsh = {'x'=>1, 'y'=>1}      
 puts "Hash is: %p." % hsh
 #=> Hash is: {"x"=>1, "y"=>1}.

Standard Library

Matz has started to open up the Ruby standard library to include support for XML, XML-RPC, SOAP, YAML, OpenSSL, unit testing, distributed computing, and much more. These libraries allow Ruby to provide a great deal functionality out-of-the-box. These libraries are also guaranteed a long life and greater support.

I would like to empasize that the decision to include these libraries in the core distribution is my favorite part of Ruby 1.8.0. We are getting closer to providing a complete toolkit for application development. We still offer a fewer set of libraries than other scripting languages, but these libraries are of incredible quality and utility.

I'm going to go through a few of these libraries, giving some sample code and pointers to where documentation can be had.

REXML: A pure Ruby XML library

REXML is an XML library of the highest order. It is simple to use, full of features, faithful to Ruby's ideals and quite swift. Many who have been frustrated by the design of other XML libraries, find complete satisfaction in REXML. Allow me a short demonstration.

One of REXML's greatest features is it's XPath support. Let's suppose we have an XML document stored in a string, such as:

 xmlstr = <<EOF
   <mydoc>
     <someelement attribute="nanoo">Text, text, text</someelement>
   </mydoc>
 EOF

Now, let's load the above document into a REXML::Document object:

 require 'rexml/document'
 xmldoc = REXML::Document.new xmlstr

If we want to access the text in the /mydoc/someelement node, we can simply access the elements property with an XPath string between square brackets:

 xmldoc.elements['/mydoc/someelement'].text
 #=> "Text, text, text" 

Attributes can be accessed via the REXML::Attributes object:

 xmldoc.elements['/mydoc/someelement'].attributes['attribute']
 #=> "nanoo" 

You can do a surprising amount with knowledge of just the above techniques. I will leave you with one other before I hand you off to REXML's documentation.

The REXML::Elements#each method is useful for cycling through a set of matching XML nodes. Supposing we wanted to cycle through all someelement nodes:

 xmldoc.elements.each('/mydoc/someelement') do |ele|
   puts ele.text
 end

REXML also has APIs for creating XML, stream parsing, event-based (SAX) parsing, entity processing, and a wealth of encodings.

For more information, I would suggest studying in the following order:

  1. REXML Tutorial: An introduction to REXML by its author.
  2. David Mertz' The REXML Library: An introduction to REXML and comparison to other XML parsing techniques.
  3. REXML FAQ: A handful of useful pointers.
  4. Full API Documentation (RDoc): Everything else.

YAML: YAML Ain't Markup Language

YAML is a simple, readable language for storing data. Ruby 1.8.0 introduces native support for loading and generating YAML.

A simple Array of Strings can be represented in YAML:

 - bicycle
 - car
 - scooter

Hashes as well:

 title: Ruby in a Nutshell
 author: Yukihiro Matsumoto
 publisher: O'Reilly and Associates

These are simple examples, though. YAML can handle a wide variety of Ruby objects and maintain its readability. YAML is a great solution for configuration files, adhoc protocols and serializing data between other scripting languages.

Incidentally, I am personally responsibile for development of this particular library. The C code that powers Ruby's YAML support is called Syck and extensions which use the same parser are available for Python and PHP. This shared parser and emitter helps guarantee that data objects are interpreted the same by the extension.

Loading YAML documents is extremely simple:

 require 'yaml'
 obj = YAML::load( File.open( 'books.yml' ) )

If books.yml contains a hash, then a Ruby Hash will be returned. If the document contains a list, then a Ruby Array will be returned.

To turn the object back into YAML:

 require 'yaml'
 File.open( 'books.yml', 'w' ) do |f|
   f << obj.to_yaml
 end

Try it sometime in IRb. Instead of using Kernel::p to inspect your objects, try Kernel::y:

 require 'yaml'
 a = { 'time' => Time.now, 'symbol' => :Test, 'number' => 12.0 }

 y a
 # ---
 # number: 12.0
 # symbol: !ruby/sym Test
 # time: 2003-08-04 21:08:37.430417 -06:00

To learn YAML, I would suggest study the following documents in order:

  1. YAML In Five Minutes: A quick beginning tutorial to the basics of YAML.
  2. YAML Cookbook: A side-by-side comparison of YAML objects and Ruby objects.
  3. YAML for Ruby Manual: A complete tutorial and API documentation covering YAML for Ruby.

WEBrick: Building HTTP Servers

WEBrick is a socket server toolkit now included with Ruby 1.8.0. The library has been in development for several years and has long been a boon to Ruby developers.

Here's a basic example web server:

 require 'webrick'

 s = WEBrick::HTTPServer.new(
    :Port     => 2000,
    :DocumentRoot => Dir::pwd + "/htdocs" 
 )

 trap( "INT" ) { s.shutdown }
 s.start

As you can see, WEBrick is a snap. In some simple benchmarks, I've found its file-serving to be comparable to Apache 1.3. WEBrick is a threading server, so it can handle a number of concurrent connections.

My favorite part of WEBrick is its pluggable architecture. You basically map (or mount) services to specific namespaces within a given server. Any request issued under that URI namespace is passed to the plugin.

Take this SOAP server as an example:

 require 'soaplet'
 srv = SOAP::WEBrickSOAPlet.new
 s.mount( "/soap", srv )

Now all requests sent to http://localhost:2000/soap/ will be processed by the SOAPlet.

There isn't much English documentation on either WEBrick or its compatriots, so you might have to dig through source code to accomplish more complicated endeavors.

Ruby/DL: Cross-platform Dynamic Linking

Honestly, my favorite part of developing in C is dynamic linking. It's so neat to load a shared object and shake hands with it and say, "Hey, there little guy. Welcome to the program." And the great thing about Ruby/DL is that you can do it all from Ruby.

I'll give you just a few examples and then refer you to ext/dl/doc/dl.txt in the Ruby 1.8.0 distribution, which documents much of what this module can do.

In this example, we're going to interface with the curl shared library. You'll be amazed how simple it is. One of the ways Ruby/DL could really benefit the Ruby community is by allowing developers to write Ruby extensions without writing them in C. Here we've got libcurl.so and we're going to interface with it directly.

 require 'dl/import'

 module Curl
   extend DL::Importable
   dlload "/usr/local/lib/libcurl.so" 
   extern "char *curl_version()" 
 end

 puts Curl.curl_version

We'll start with an easy one. Retrieving the version number. Libcurl has a curl_version() call, which returns a string. All we have to do is provide Ruby/DL with the function prototype and we're set!

I'm encapsulating the Curl API in a module called Curl. The extend DL::Importable introduces a number of methods for interfacing with the DLL. I load the shared object with dlload. Then, provide the prototype (minus the semicolon) to extern. Now the Curl module has a method called curl_version which can be used to retrieve the version!

Things start to get more complicated when dealing with C structures. But Ruby/DL has ways of handling structs, callbacks and even pointers!

Let's try adding a method for the curl_version_info(), which returns a struct.

 require 'dl/struct'

 module Curl
   VersionInfoData = struct [
     "int age",
     "char *version",
     "uint version_num",
     "char *host",
     "int features",
     "char *ssl_version",
     "long ssl_version_num",
     "char *libz_version",
     "char **protocols" 
   ]
   extern "void *curl_version_info(int)" 
 end

 ver = Curl::VersionInfoData.new( Curl.curl_version_info( 0 ) )
 puts "Curl version: " + ver.version
 puts "Built on " + ver.host.to_s
 puts "Libz version: " + ver.libz_version.to_s

In many cases, you may not even be modifying the struct that is returned from a DL call. Above we're casting a void pointer to a VersionInfoData struct with the new method. If you don't need to get into the nitty-gritty, then don't bother. Simply have the function prototype returning a void pointer and pass the return of a call into other calls.

I will also show you a simple demonstration in pointer math with Ruby/DL. The above VersionInfoData contains a list of supported protocols in a char pointer-pointer. This is an array of strings, ended with a pointer. We'll use DL::sizeof to retrieve the size of a character pointer and loop until we hit NULL.

 puts "Supported protocols:" 
 (0..100).step( DL::sizeof('s') ) do |offset|
   protocol = ( ver.protocols + offset ).ptr
   break unless protocol
   puts ".. #{protocol}" 
 end

Now that's nifty. Pointer math can be done against the DL::PtrData class!

Now, if you want some great examples, head over to the Ruby/DL site. They've got a concise libxslt sample, a GTK+ sample, and a bunch of Win32API samples. I dare someone to write a whole extension in this. Seriously, I will send that person a free bathrobe.

StringIO

The StringIO class hardly needs an explanation. Long have Ruby developers bundled this class with their packages. By allowing you to read and write to a String like an IO object, the StringIO class keeps you from having to treat Strings and Files like separate creatures. Instead, require the StringIO class and handle everything with each, readlines, seek and all of your other favorite IO methods.

 require 'stringio'
 s = StringIO.new( <<EOY )
 .. string to read from here ..
 EOY

 s2 = StringIO.new
 s.readlines.each do |line|
   # Very basic stripping of HTML tags
   line.gsub!( /<[^>]+>/, '' )
   s2.write( line )
 end

StringIO can be especially helpful to those who are writing parsers (which includes the abundance of you who are writing templating engines!). Remember that, like other IO classes, StringIO keeps line number (StringIO#lineno) and character position (StringIO#pos) data, which is essential for error-reporting.

Also, many developers were using a pure Ruby version of StringIO for Ruby 1.6.8. In Ruby 1.8.0, StringIO is a C extension.

open-uri

I have been dying for this library to come up in the standard library. Most of you don't know it, but you can quit using Net::HTTP and Net::FTP. The open-uri library is a Ruby equivalent to wget or curl (the library mentioned in the previous section).

Basically, this module allows you to use the basic open method with URLs.

 require 'open-uri'
 require 'yaml'

 open( "http://www.whytheluckystiff.net/why.yml" ) do |f|
   feed = YAML::load( f )
 end

The above script loads my YAML news feed into the feed variable. The block gets passed a StringIO object (as previously discussed), which can be read by the YAML module. Ain't it lovely to see everything working together so nicely?

Seems too simple? No way. You have plenty of control over the sending of headers:

 open("http://www.ruby-lang.org/en/",
   "User-Agent" => "Ruby/#{RUBY_VERSION}",
   "From" => "foo@bar.invalid",
   "Referer" => "http://www.ruby-lang.org/") {|f|
   ...
 }

And there is plenty of metadata mixed in to the response:

 open("http://www.ruby-lang.org/en") {|f|
   f.each_line {|line| p line}
   p f.base_uri     # URI::HTTP http://www.ruby-lang.org/en/
   p f.content_type   # "text/html" 
   p f.charset     # "iso-8859-1" 
   p f.content_encoding # []
   p f.last_modified  # Thu Dec 05 02:45:02 UTC 2002
 }

I used a variant of this module in RAAInstall. It was essential. We could pull down all sorts of URLs from RAA and just pass them on to open-uri without worry. Saved quite a bit of time.

PP: The Pretty Printer

Every object in Ruby has an inspect method, which allows the contents of an object to be readably displayed at any time. A common way to inspect objects is to use the Kernel#p method, which prints an inspection of a Ruby object:

 >> p Hash['mouse', 0.4, 'horse', 12.3]
 {"horse"=>12.3, "mouse"=>0.4}

Above, the contents of a Hash are displayed. Strings are surrounded by quotes. Numbers, dates, are formatted simply.

Unfortunately, complicated class structures can still be difficult to read at times. Here's a YAML news feed which, when printed with Kernel#p, becomes a mess of Arrays and Hashes, wrapped to fit my terminal:

 => {"modified"=>Wed Feb 05 12:29:29 UTC 2003, "language"=>"en-us", 
 "title"=>"My First Weblog", "issued"=>Wed Feb 05 12:29:29 UTC 2003
 , "author"=>{"name"=>"John Doe", "url"=>"/johndoe/", "email"=>"joh
 n.doe@example.com"}, "contributors"=>[{"name"=>"Bob Smith", "url"=
 >"/bobsmith/", "email"=>"bob.smith@example.com"}], "subtitle"=>"Ai
 n't the Interweb great?", "created"=>Wed Feb 05 12:29:29 UTC 2003,
 "link"=>"/johndoe/weblog/", "entries"=>[{"modified"=>Wed Feb 05 12
 :29:29 UTC 2003, "title"=>"My First Entry", "issued"=>Wed Feb 05 1
 2:29:29 UTC 2003, "id"=>"e34", "contributors"=>[{"name"=>"John Doe
 ", "role"=>"author", "url"=>"/johndoe/", "email"=>"john.doe@exampl
 e.com"}, {"name"=>"Bob Smith", "role"=>"graphical-artist", "url"=>
 "/bobsmith/", "email"=>"bob.smith@example.com"}], "summary"=>"A ve
 ry boring entry; just learning how to blog here...", "subtitle"=>" 
 In which a newbie learns to blog...", "content"=>[{"lang"=>"en-us" 
 , "data"=>"Hello, __weblog__ world! 2 < 4!\n"}, {"type"=>"text/html" 
 , "lang"=>"en-us", "data"=>"<p>Hello, <em>weblog</em> world! 2 &lt
 ; 4!</p>\n"}, {"type"=>"image/gif", "lang"=>"en-us", "data"=>"GIF8
 9a\f\000\f\000\204\000\000\377\377\367\365\365"}], "created"=>Wed 
 Feb 05 12:29:29 UTC 2003, "link"=>"/weblog/archive/45.html"}], "ba
 se"=>"http://example.com"}

The PrettyPrint module (pp) is a severe enhancement to the traditional inspect technique. The goal of the module is to enhance readability by throwing in some conservative whitespace to indicate hierarchy and perform wrapping of longer content.

The same document printed with pp:

 >> require 'pp'
 => true
 >> pp YAML::load( File.open( 'pie.yml' ) )
 {"modified"=>Wed Feb 05 12:29:29 UTC 2003,
  "language"=>"en-us",
  "title"=>"My First Weblog",
  "issued"=>Wed Feb 05 12:29:29 UTC 2003,
  "author"=>
  {"name"=>"John Doe", "url"=>"/johndoe/",
"email"=>"john.doe@example.com"},
  "contributors"=>
  [{"name"=>"Bob Smith",
   "url"=>"/bobsmith/",
   "email"=>"bob.smith@example.com"}],
  "subtitle"=>"Ain't the Interweb great?",
  "created"=>Wed Feb 05 12:29:29 UTC 2003,
  "link"=>"/johndoe/weblog/",
  "entries"=>
  [{"modified"=>Wed Feb 05 12:29:29 UTC 2003,
   "title"=>"My First Entry",
   "issued"=>Wed Feb 05 12:29:29 UTC 2003,
   "id"=>"e34",
   "contributors"=>
    [{"name"=>"John Doe",
     "role"=>"author",
     "url"=>"/johndoe/",
     "email"=>"john.doe@example.com"},
    {"name"=>"Bob Smith",
     "role"=>"graphical-artist",
     "url"=>"/bobsmith/",
     "email"=>"bob.smith@example.com"}],
   "summary"=>"A very boring entry; just learning how to blog here...",
   "subtitle"=>"In which a newbie learns to blog...",
   "content"=>
    [{"lang"=>"en-us", "data"=>"Hello, __weblog__ world! 2 < 4!\n"},
    {"type"=>"text/html",
     "lang"=>"en-us",
     "data"=>"<p>Hello, <em>weblog</em> world! 2 &lt; 4!</p>\n"},
    {"type"=>"image/gif",
     "lang"=>"en-us",
     "data"=>"GIF89a\f\000\f\000\204\000\000\377\377\367\365\365"}],
   "created"=>Wed Feb 05 12:29:29 UTC 2003,
   "link"=>"/weblog/archive/45.html"}],
  "base"=>"http://example.com"}

In Hashes, pp attempts to keep keys and values on the same line. But if longer content is found, the value is placed on a newline. This sort of layout is possible through a flexible PP class used for string construction.

Whereas Ruby's inspect receives no arguments, your custom pretty_print method will receive an instance of the PP class, which allows you to give cues as to where to group or allow breaks in content. Here's an example from the Array#pretty_print method included with PP:

 class Array
   def pretty_print(pp)
     pp.group(1, '[', ']') {
       self.each {|v|
         pp.comma_breakable unless pp.first?
         pp.pp v
       }
     }
   end
 end

I think this is a great model for building strings. This same approach could be use effectively to build HTML, XML, or Textile from data structures. A very similiar technique is used by Ruby's YAML emitter.

In fact, I'll also mention that if you've loaded the YAML module, you can use Kernel#y to print structures in YAML:

 >> require 'yaml'
 => true
 >> y YAML::load( File.open( 'pie.yml' ) )
 ---
 modified: 2003-02-05 12:29:29.000000 Z
 language: en-us
 title: My First Weblog
 issued: 2003-02-05 12:29:29.000000 Z
 author:
   name: John Doe
   url: "/johndoe/" 
   email: john.doe@example.com
 # .. etc ..

Un: (As in -run)

Ruby works quite well across platforms. I'm always surprised at how well my Ruby code can execute flawlessly across platforms. Ten years ago cross-platform apps were an absolute joke! But now it's another pleasant reality for scripters.

The un library takes avantage of Ruby's cross-platform support to provide the common UNIX commands for all Ruby users. If you're on Windows, rather than installing Cygwin or MinGW, you can use UNIX commands through your Ruby 1.8.0 installation.

To execute from the commandline, type: ruby -run -e. Type the command. Followed by --. Finish with the options to the command.

Here are a list of un's included commands:

 ruby -run -e cp -- [OPTION] SOURCE DEST
 ruby -run -e ln -- [OPTION] TARGET LINK_NAME
 ruby -run -e mv -- [OPTION] SOURCE DEST
 ruby -run -e rm -- [OPTION] FILE
 ruby -run -e mkdir -- [OPTION] DIRS
 ruby -run -e rmdir -- [OPTION] DIRS
 ruby -run -e install -- [OPTION] SOURCE DEST
 ruby -run -e chmod -- [OPTION] OCTAL-MODE FILE
 ruby -run -e touch -- [OPTION] FILE

You could provide aliases in your environment for these to emulate the UNIX commands. Or you could use these to build cross-platform Makefiles or simple .bat/.sh scripts.

The neatest thing about un is how it works, though. Let's dissect the commandline.

The first section (ruby -run) simply starts the Ruby interpreter and requires the un library. Then, the -e cp option indicates that we want to execute the cp method (which is mixed in from the un library.) The double-dash (--) indicates that all further options will be sent to ARGV (and hence to un). So, un simply reads the rest of the line from ARGV. Pretty clever!n


by why the lucky stiff

august 18, 2004