WEBrickには%uとか使えないの？ - 単なる日記＠はてな

WEBrickには%uとか使えないの？ - ma2の日記より。WEBrick::HTTPUtils::_unescapeで対応してみた。以下のコードを使うかどうかはともかくこれくらいの変更で済むなら、要望すれば正式に対応してくれそうな気がする。

# -*- mode: ruby; coding: utf-8 -*-

require "webrick/cgi"

module WEBrick
  module HTTPUtils
    self.instance_eval do
      remove_const(:ESCAPED)
    end
    ESCAPED = /%([0-9a-fA-F]{2})|%u([0-9a-fA-F]{2})([0-9a-fA-F]{2})/

    def _unescape(str, regex)
      str.gsub(regex) do
        if $1
          $1.hex.chr
        else
          $2.hex.chr + $3.hex.chr
        end
      end
    end
    module_function :_unescape
  end
end

require "test/unit"
require "iconv"

class TC_WEBrick__HTTPUtils < Test::Unit::TestCase
  HOGE_UTF8 = "ほげ"
  HOGE_UTF16 = Iconv.iconv("UTF-16", "UTF-8", HOGE_UTF8)[0][2 .. -1]

  def test_parse_query
    assert_equal({"foo" => HOGE_UTF8}, WEBrick::HTTPUtils.parse_query("foo=%e3%81%bb%e3%81%92"))
    assert_equal({"foo" => HOGE_UTF16}, WEBrick::HTTPUtils.parse_query("foo=%u307b%u3052"))
  end
end

あ、上記テストコードで思わずUTF-16のBOMを削除してるけどいいのかな。仕様を調べずに脊髄反射でコードを書くのは良くないね。すんません。