(keitai-l) Re: Supported Character Sets for I-mode

From: Curt Sampson <cjs_at_cynic.net>
Date: 01/13/06
Message-ID: <Pine.NEB.4.63.0601131445030.9029@angelic.cynic.net>
On Fri, 13 Jan 2006, Nick May wrote:

> But the fact remains they would get the benefits noted in my last post
> if they ran it on eucjp.

Sorry to be rude, but benefits you mentioned in your last post are
complete rubbish.

Let's have a look at slashdot.co.jp's top page and an article page with
150+ comments, compressed and uncompressed, in various encodings.

   compressed uncompressed  ratio uncompressed_name
        21808       142018  84.6% comments.utf-8.html
        20226       130989  84.5% comments.euc-jp.html
        20359       130989  84.4% comments.sjis.html
        15648        61434  74.5% top.utf-8.html
        14616        56637  74.1% top.euc-jp.html
        14632        56637  74.1% top.sjis.html

This size is for the HTML file alone, and does not include the style
sheets or images.

Uncompressed, EUC-JP and Shift-JIS are about 8% smaller than UTF-8.
compressed, about 7-8% smaller. For the compressed pages, you have
to send 10 packets rather than 9, which in a typical TCP connection
will increase download time by perhaps 3-4% (it's the latency for the
connection setup and request/response turnaround that eats a lot of time
in requests this size).

And this is buying you not just avoidance of pain in situations where
you have to interoperate with non-Japanese stuff, but is also, in fact,
improving your Japanese support: UTF-8 lets you encode some Japanese
characters that cannot be encoded in Shift-JIS or EUC-JP, yet there is
not a single character encodable in Shift-JIS or EUC-JP that cannot be
encoded in UTF-8.

But if you're really that intent on shinking your Asian text, just
use UCS-2 or UTF-16 and SCSU (Unicode Technical Standard #6 - A
Standard Compression Scheme for Unicode) and you'll find that both your
straight-Japanese and your straight-ASCII files, as well as almost
all of your files in between, are smaller than their EUC-JP or their
Shift-JIS equivalants.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974

***   Contribute to the Keitai Developers' Wiki!   ***
***           http://www.keitai-dev.net/           ***
Received on Fri Jan 13 08:21:55 2006