hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

The Story of Streaming HTTP Through MouseHole (With Subsequent Adventures in Continuations and Duck Typing) #

by why in inspect

Hacking MouseHole has been considerably fun and challenging, especially since much of the work involves subclassing and reshaping existing classes. It’s like glass blowing. Overriding methods here and there, but holding on to the underlying class as much as possible.

(If you’re new, MouseHole is an idea conceived by the readers of this blog: a scriptable proxy for rewriting the web and sharing little web applications. The wiki goes into detail.)

I’m sure one of the things that deters many from using MouseHole is the word proxy. No one wants to hinder their browsing with longer wait times while a proxy gets its act together. Cleaning the HTML, parsing it, rewriting it, rebuilding it. Sounds like a lot.

So, I’ve been working on adding streaming support to the proxy. So downloads will get a progress bar in the browser. And so that images, css, unhandled pages, etc. will travel through the proxy quicker. It’s working well, much improvement, but just a couple notes on what it took.

The Net::HTTP and WEBrick Cable

At heart, the WEBrick::HTTPProxy uses a bit of code to pass its request onto the web and then hand it back to the browser. Trimming out a bit of cruft, it looks like this:

 begin
   http = Net::HTTP.new(uri.host, uri.port, proxy_host, proxy_port)
   http.start{
     case req.request_method
     when "GET"  then response = http.get(path, header)
     when "POST" then response = http.post(path, req.body || "", header)
     when "HEAD" then response = http.head(path, header)
     else
       raise HTTPStatus::MethodNotAllowed,
         "unsupported method `#{req.request_method}'." 
     end
   }
 rescue => err
   logger.debug("#{err.class}: #{err.message}")
     raise HTTPStatus::ServiceUnavailable, err.message
 end

 # Convert Net::HTTP::HTTPResponse to WEBrick::HTTPProxy
 res.status = response.code.to_i
 choose_header(response, res)
 set_cookie(response, res)
 set_via(res)
 res.body = response.body

 # Process contents
 if handler = @config[:ProxyContentHandler]
   handler.call(req, res)
 end

So three things:

  1. Use Net::HTTP to retrieve the resource from the web.
  2. Convert the response to WEBrick’s own response object type.
  3. Pass the response to a handler, if one is setup.

In the case of MouseHole: yes, we’ve got a handler. Our handler checks to see if a MouseHole script wants to mess with the resource.

Now, how do we add streaming to this? The above code is using Net::HTTP to download the whole resource before moving on. We can’t have this. We want to just get the headers and let the handler decide what to do with the response body.

One other thing about WEBrick that I discovered: if you give it a response where the @body is an IO object, it’ll stream that object to the output. Perfect, right?

Pulling the Thread That Goes Back in Time

Digging around in the Net::HTTP code was taking forever and I really wanted to get something working. So I thought I’d try wrapping a Generator around the whole thing.

In my subclass of the proxy, I overrode the method containing the above code with:

 response = nil
 begin
   http = Net::HTTP.new(uri.host, uri.port, proxy_host, proxy_port)
   g = Generator.new do |g|
     http.start do
       chunkd = proc do |gres|
         g.yield gres
         gres.read_body do |gstr|
           g.yield gstr
         end
       end
       case req.request_method
       when "GET"  then http.request_get(path, header, &chunkd)
       when "POST" then http.request_post(path, req.body || "", header, &chunkd)
       when "HEAD" then http.request_head(path, header, &chunkd)
       else
         raise WEBrick::HTTPStatus::MethodNotAllowed,
           "unsupported method `#{req.request_method}'." 
       end
     end
   end

   # Use the generator to mimick an IO object
   def g.read sz = 0; next? ? self.next : nil end
   def g.size; 0 end
   def g.close; while next?; self.next; end end
   def g.is_a? klass; klass == IO ? true : super(klass); end
 rescue => err
   logger.debug("#{err.class}: #{err.message}")
     raise WEBrick::HTTPStatus::ServiceUnavailable, err.message
 end

 response = g.next

 # Convert Net::HTTP::HTTPResponse to WEBrick::HTTPProxy
 res.status = response.code.to_i
 choose_header(response, res)
 set_cookie(response, res)
 set_via(res)
 res.body = g
 def res.send_body(socket)
   if @body.respond_to? :read
     send_body_io(socket)
   else 
     send_body_string(socket)
   end
 end

 # Process contents
 if handler = @config[:ProxyContentHandler]
   handler.call(req, res)
 end

This is a ridiculous hack. Maybe. When the generator is created, the code inside isn’t executed. It all gets skipped. But when I run g.next, the code whirs into motion. And the first yield passes us a response at the moment the headers are parsed, but before the body has been read.

The hackiness has to do with getting WEBrick to actually accept the generator as an IO object. Naturally, there has to be a read method, which just calls next to get chunks of the stream. Adding size and close methods made sense as well. But I actually has to override is_a? to let my duck-typed generator pass through. Ugghh.

Overall, it works well. I didn’t even notice much slowness. Until memory got gobbled up. Which happens rather quickly with continuations.

Punching Holes in Net:HTTP

Now that I had a decent angle on this, I decided to take a different approach: to rework Net::HTTP as an IO object. I would have loved to use OpenURI but it doesn’t support all the HTTP request methods. Also, http-access2 seemed to have the same problems as Net::HTTP.

The problem is that Net::HTTP will only work as a stream when it’s given a block. If no block is given, the whole stream is read into memory and returned.

The gist of the hack is this:

 require 'net/http'
 module Net
 class HTTPIO < HTTP
   def request(req, body = nil, &block)
     begin_transport req
     req.exec @socket, @curr_http_version, edit_path(req.path), body
     begin
       res = HTTPResponse.read_new(@socket)
     end while HTTPContinue === res

     res.instance_eval do
       def read len = nil; ... end
       def body; true end
       def close
         req, res = @req, self
         @http.instance_eval do
            end_transport req, res
            finish
         end
       end
       def size; 0 end
       def is_a? klass; klass  IO ? true : super(klass); end
     end

     res
   end
 end

Since all the request methods route through Net::HTTP#request, it was just a matter of replacing that method. We’ve come to expect this in object-oriented languages. But what pushes the lever further in Ruby is how I can also redefine portions of the HTTPResponse object (using singleton methods), reshape it without needing to affect its original class.

In all, I feel like some of the same old good practices could have facilitated the hack easier:

  • Please, no is_a? or = tests for IO objects. Duck type: use input.respond_to? :read.
  • If you have a streaming IO class, don’t require a block in order to stream. I might need to take that stream out of scope with me. Also, what if I need to start and stop the stream?

In the end, I’m left pretty guilty myself. I’ve hacked classes for my own use but they’re pretty worthless outside of MouseHole. I gotta find a way to push this back into the original classes or something.

said on 13 Oct 2005 at 14:46

Hey why, can you please add syntax colorizing to your code here? It sure would make it easier to read.

said on 13 Oct 2005 at 14:53

Well it is like they say, you can never predict the strange way customers will use your products. In this case you are the customer and the product is Net::HTTP. I agree that using respond_to? is better than is_a?, and clearly here we have a good example of why.

Ruby in particular is possibly more “vulnerable to customer hacks” than other languages, because of how the classes are open. So you might as well write your code to be as flexible as possible.

BTW , this is all pretty impressive with the streaming and all. I took a quick look in the Wonderland days and it seemed tricky.

said on 14 Oct 2005 at 10:07

baffle! wooaang!

Comments are closed for this entry.