Not sure if this is possible, just throwing out ideas.
Maybe something like :
Googling around, bumped into this
Not all sites declare their encoding using the meta element, but at
least it's a shot. Once you scrape the encoding info, shoot that to
your PHP script before the Japanese text string.
On 6/1/07, Erick Papadakis <erick.papa_at_gmail.com> wrote:
> Seeing as how this list is aflutter with tech savvy folk, I hope
> someone can shed some light on this problem.
> We're developing something in Japanese that needs input from a
> because the text comes from client side using a bookmarklet. (If it
> could be a regular POST or GET, then there'd be no issues).
> My problem is that Japan seems to have had a devil of a time getting
> to standardize its character sets! Some big sites like isize.com use
> Shift_JIS, while others such as Goo or Mixi use EUC-JP, while several
> of the more modern ones (such as blogs) use UTF-8.
> When we capture the TITLE (document.title) from these websites, and
> then "rawurldecode" the received text in PHP, the string comes up
> jumbled. If we knew the standard character set before hand, we could
> have used the right mb_convert_encoding and such, but this is now an
> but that doesn't work either -- I wonder if that's a deprecated
> element of the document object?
> Would appreciate any insight into how you have solved the issue of
> different in-coming text into programs. The php function
> "mb_detect_encoding" is totally useless. Given a string, it always
> seems to return utf-8.
> Many thanks in advance!
> This mail was sent to address chriskk_at_gmail.com
> Need archives? How to unsubscribe? http://www.appelsiini.net/keitai-l/
Received on Sat Jun 2 05:29:49 2007