UTF-8 Plugin for Rails, Fine for Ruby 1.8 #
Here’s the story thus far. Ruby has no Unicode support in 1.8 (except for Regexps), but it is forthcoming and Matz has stated his intentions. In the meantime, there’s been a quiet work to scrape up Unicode-aware String classes.
Today, Manfred Stienstra has lobbed a bunch of details on using the new UTF-8 encoding plugin for Rails. You can certainly use just the String class extensions in any other traditional Ruby stuff.
By creating this plugin we haven’t resolved all our problems. One of the biggest problems is that we can only process UTF-8 encoded strings. [...] Sure, there are solutions like iconv to re-encode this data, but life would be a lot simpler if we didn’t have to think about this.
This plugin by Julian Tarkhanov does require the Unicode library.
some1else
I can’t wait to see the day.
scritch
Isn’t the String class extension’s capitalize method wrong? Its first line is “byte_capitalize unless utf8_pragma?” but shouldn’t it be “return byte_capitalize unless utf8_pragma?”?
Danail
and the difference is?
scritch
The difference is that with the return it does not call and return the Unicode::capitalize(Unicode::normalize_KC(self)) if utf8_pragma?...
Manfred
No, Scritch is right. That is a bug in the string_overrides.rb.
I will send a patch to Julian.
Izidor
Like I said elsewhere, there are problems with this plugin. In short – there exists code which always wants byte-oriented String#slice, #count, etc. (file handling, net packet processing, db adapters, etc). If the string contains valid utf8 data, such code will fail.
It seems that at least Webrick contains some, because I had problems with Rails when running with Webrick and this plugin.
It is very dangerous to override String methods depending on string content.
Comments are closed for this entry.