On Fri, 1 Jun 2007, Erick Papadakis wrote:
> My problem is that Japan seems to have had a devil of a time getting
> to standardize its character sets! Some big sites like isize.com use
> Shift_JIS, while others such as Goo or Mixi use EUC-JP, while several
> of the more modern ones (such as blogs) use UTF-8.
Some use all sorts. Starling's generally use UTF-8, but we convert
everything to Shift_JIS (on the fly) or Docomo phones.
> When we capture the TITLE (document.title) from these websites, and
> then "rawurldecode" the received text in PHP, the string comes up
> jumbled. If we knew the standard character set before hand, we could
> have used the right mb_convert_encoding and such, but this is now an
Ideally, you use the character set encoding from the Content-type
pretty interested to hear about it.
If there's a META tag, as Christopher pointed out, you can give that a
try. But not everybody uses it (for good reason, actually, for those of
us who do on-the-fly conversion), and the encoding from the content-type
header overrides it, anyway.
> Would appreciate any insight into how you have solved the issue of
> different in-coming text into programs.
For me over the past seven or eight years, mostly, it's been about
dealing with forms, and I generally just put a hidden text field in the
form with the character set encoding. (Browsers always post using the
encoding in which they received the page containing the form from which
Curt Sampson <cjs_at_cynic.net> +81 90 7737 2974
Mobile sites and software consulting: http://www.starling-software.com
Received on Tue Jun 5 12:58:58 2007