The Story of Streaming HTTP Through MouseHole (With Subsequent Adventures in Continuations and Duck Typing) #
Hacking MouseHole has been considerably fun and challenging, especially since much of the work involves subclassing and reshaping existing classes. It’s like glass blowing. Overriding methods here and there, but holding on to the underlying class as much as possible.
(If you’re new, MouseHole is an idea conceived by the readers of this blog: a scriptable proxy for rewriting the web and sharing little web applications. The wiki goes into detail.)
I’m sure one of the things that deters many from using MouseHole is the word proxy. No one wants to hinder their browsing with longer wait times while a proxy gets its act together. Cleaning the HTML, parsing it, rewriting it, rebuilding it. Sounds like a lot.
So, I’ve been working on adding streaming support to the proxy. So downloads will get a progress bar in the browser. And so that images, css, unhandled pages, etc. will travel through the proxy quicker. It’s working well, much improvement, but just a couple notes on what it took.
The Net::HTTP and WEBrick Cable
At heart, the WEBrick::HTTPProxy uses a bit of code to pass its request onto the web and then hand it back to the browser. Trimming out a bit of cruft, it looks like this:
begin
http = Net::HTTP.new(uri.host, uri.port, proxy_host, proxy_port)
http.start{
case req.request_method
when "GET" then response = http.get(path, header)
when "POST" then response = http.post(path, req.body || "", header)
when "HEAD" then response = http.head(path, header)
else
raise HTTPStatus::MethodNotAllowed,
"unsupported method `#{req.request_method}'."
end
}
rescue => err
logger.debug("#{err.class}: #{err.message}")
raise HTTPStatus::ServiceUnavailable, err.message
end
# Convert Net::HTTP::HTTPResponse to WEBrick::HTTPProxy
res.status = response.code.to_i
choose_header(response, res)
set_cookie(response, res)
set_via(res)
res.body = response.body
# Process contents
if handler = @config[:ProxyContentHandler]
handler.call(req, res)
end
So three things:
- Use Net::HTTP to retrieve the resource from the web.
- Convert the response to WEBrick’s own response object type.
- Pass the response to a handler, if one is setup.
In the case of MouseHole: yes, we’ve got a handler. Our handler checks to see if a MouseHole script wants to mess with the resource.
Now, how do we add streaming to this? The above code is using Net::HTTP to download the whole resource before moving on. We can’t have this. We want to just get the headers and let the handler decide what to do with the response body.
One other thing about WEBrick that I discovered: if you give it a response where the @body is an IO object, it’ll stream that object to the output. Perfect, right?
Pulling the Thread That Goes Back in Time
Digging around in the Net::HTTP code was taking forever and I really wanted to get something working. So I thought I’d try wrapping a Generator around the whole thing.
In my subclass of the proxy, I overrode the method containing the above code with:
response = nil
begin
http = Net::HTTP.new(uri.host, uri.port, proxy_host, proxy_port)
g = Generator.new do |g|
http.start do
chunkd = proc do |gres|
g.yield gres
gres.read_body do |gstr|
g.yield gstr
end
end
case req.request_method
when "GET" then http.request_get(path, header, &chunkd)
when "POST" then http.request_post(path, req.body || "", header, &chunkd)
when "HEAD" then http.request_head(path, header, &chunkd)
else
raise WEBrick::HTTPStatus::MethodNotAllowed,
"unsupported method `#{req.request_method}'."
end
end
end
# Use the generator to mimick an IO object
def g.read sz = 0; next? ? self.next : nil end
def g.size; 0 end
def g.close; while next?; self.next; end end
def g.is_a? klass; klass == IO ? true : super(klass); end
rescue => err
logger.debug("#{err.class}: #{err.message}")
raise WEBrick::HTTPStatus::ServiceUnavailable, err.message
end
response = g.next
# Convert Net::HTTP::HTTPResponse to WEBrick::HTTPProxy
res.status = response.code.to_i
choose_header(response, res)
set_cookie(response, res)
set_via(res)
res.body = g
def res.send_body(socket)
if @body.respond_to? :read
send_body_io(socket)
else
send_body_string(socket)
end
end
# Process contents
if handler = @config[:ProxyContentHandler]
handler.call(req, res)
end
This is a ridiculous hack. Maybe. When the generator is created, the code inside isn’t executed. It all gets skipped. But when I run g.next, the code whirs into motion. And the first yield passes us a response at the moment the headers are parsed, but before the body has been read.
The hackiness has to do with getting WEBrick to actually accept the generator as an IO object. Naturally, there has to be a read method, which just calls next to get chunks of the stream. Adding size and close methods made sense as well. But I actually has to override is_a? to let my duck-typed generator pass through. Ugghh.
Overall, it works well. I didn’t even notice much slowness. Until memory got gobbled up. Which happens rather quickly with continuations.
Punching Holes in Net:HTTP
Now that I had a decent angle on this, I decided to take a different approach: to rework Net::HTTP as an IO object. I would have loved to use OpenURI but it doesn’t support all the HTTP request methods. Also, http-access2 seemed to have the same problems as Net::HTTP.
The problem is that Net::HTTP will only work as a stream when it’s given a block. If no block is given, the whole stream is read into memory and returned.
The gist of the hack is this:
require 'net/http'
module Net
class HTTPIO < HTTP
def request(req, body = nil, &block)
begin_transport req
req.exec @socket, @curr_http_version, edit_path(req.path), body
begin
res = HTTPResponse.read_new(@socket)
end while HTTPContinue === res
res.instance_eval do
def read len = nil; ... end
def body; true end
def close
req, res = @req, self
@http.instance_eval do
end_transport req, res
finish
end
end
def size; 0 end
def is_a? klass; klass IO ? true : super(klass); end
end
res
end
end
Since all the request methods route through Net::HTTP#request, it was just a matter of replacing that method. We’ve come to expect this in object-oriented languages. But what pushes the lever further in Ruby is how I can also redefine portions of the HTTPResponse object (using singleton methods), reshape it without needing to affect its original class.
In all, I feel like some of the same old good practices could have facilitated the hack easier:
- Please, no
is_a?or=tests for IO objects. Duck type: useinput.respond_to? :read. - If you have a streaming IO class, don’t require a block in order to stream. I might need to take that stream out of scope with me. Also, what if I need to start and stop the stream?
In the end, I’m left pretty guilty myself. I’ve hacked classes for my own use but they’re pretty worthless outside of MouseHole. I gotta find a way to push this back into the original classes or something.


Kevin Ballard
Hey why, can you please add syntax colorizing to your code here? It sure would make it easier to read.
MrCode
Well it is like they say, you can never predict the strange way customers will use your products. In this case you are the customer and the product is Net::HTTP. I agree that using respond_to? is better than is_a?, and clearly here we have a good example of why.
Ruby in particular is possibly more “vulnerable to customer hacks” than other languages, because of how the classes are open. So you might as well write your code to be as flexible as possible.
BTW , this is all pretty impressive with the streaming and all. I took a quick look in the Wonderland days and it seemed tricky.
mu's
baffle! wooaang!
Comments are closed for this entry.