HTML Filtering For RedCloth #
This isn’t a patch for RedCloth, it’s a method you can use to filter out HTML in general. But it works nicely with RedCloth output. I use it for the comments on this site. It’s the best solution I can think of for world-writable files.
class String ## Dictionary describing allowable HTML ## tags and attributes. BASIC_TAGS = { 'a' => ['href', 'title'], 'img' => ['src', 'alt', 'title'], 'br' => [], 'i' => nil, 'u' => nil, 'b' => nil, 'pre' => nil, 'kbd' => nil, 'code' => ['lang'], 'cite' => nil, 'strong' => nil, 'em' => nil, 'ins' => nil, 'sup' => nil, 'sub' => nil, 'del' => nil, 'table' => nil, 'tr' => nil, 'td' => nil, 'th' => nil, 'ol' => nil, 'ul' => nil, 'li' => nil, 'p' => nil, 'h1' => nil, 'h2' => nil, 'h3' => nil, 'h4' => nil, 'h5' => nil, 'h6' => nil, 'blockquote' => ['cite'] } ## Method which cleans the String of HTML tags ## and attributes outside of the allowed list. def clean_html!( tags = BASIC_TAGS ) gsub!( /<(\/*)(\w+)([^>]*)>/ ) do raw = $~ tag = raw[2].downcase if tags.has_key? tag pcs = [tag] tags[tag].each do |prop| ['"', "'", ''].each do |q| q2 = ( q != '' ? q : '\s' ) if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i pcs << "#{prop}=\"#{$1.gsub('"', '\\"')}\"" break end end end if tags[tag] "<#{raw[1]}#{pcs.join " "}>" else " " end end end end
Be sure to use it after you convert your Textile to HTML.
comment = RedCloth.new( entry.comment ).to_html comment.clean_html!
I’d like to make RedCloth’s built-in filter allow this kind of customization. It may even be worthwhile to have it scan for allowed CSS within a style declaration. On a Wiki, it’s nice to allow people to come up with widths and floating directions and detailed colors, you know?
What do you think
flgr
I think it is vulnerable to this attack:
>flgr
< plaintext>
flgr
flgr
flgr
Hm, holds up quite well. :)
flgr
I wonder what happens when I use newlines.
flgr
” style=”font-size: 500pt;”>Attribute injection?
< plaintext>
flgr
xal
My wish for redcloth would be that the output could be created using a visitor style approach. This would make it a lot easier to create different output scripts like tolatex and todocbook.
why