(keitai-l) Re: Supported Character Sets for I-mode

From: Curt Sampson <cjs_at_cynic.net>
Date: 01/13/06
Message-ID: <Pine.NEB.4.63.0601131636470.9029@angelic.cynic.net>
On Fri, 13 Jan 2006, Curt Sampson wrote:

> Let's have a look at slashdot.co.jp's top page and an article page with
> 150+ comments, compressed and uncompressed, in various encodings.
>
>   compressed uncompressed  ratio uncompressed_name
>        21808       142018  84.6% comments.utf-8.html
>        20226       130989  84.5% comments.euc-jp.html
>        20359       130989  84.4% comments.sjis.html
>        15648        61434  74.5% top.utf-8.html
>        14616        56637  74.1% top.euc-jp.html
>        14632        56637  74.1% top.sjis.html

I should mention, as I forgot to earlier, that these were compressed
with gzip at the default compression level. Just for a quick comparison,
if anybody's curious, bzip2 -9 gives:

comments.utf-8.html:   87.92% saved, 142018 in, 17157 out.
comments.euc-jp.html:  87.11% saved, 130989 in, 16879 out.
comments.sjis.html:    87.17% saved, 130989 in, 16808 out.
top.utf-8.html:        77.36% saved, 61434 in, 13907 out.
top.euc-jp.html:       75.93% saved, 56637 in, 13635 out.
top.sjis.html:         75.89% saved, 56637 in, 13655 out.

One way of looking at this is that on a 130 KB EUC-JP file, the
difference between that and UTF-8 after compression is 278 bytes, or
0.21% of the original (smaller EUC-JP) file size.

I think that this pretty much explodes any arguments about UTF-8 versus
EUC-JP if your main concern is data size; what do you do in terms of
compression makes much, much more difference.

I should also note, for those who might bring up the issue of CPU speed,
that on modern computers, uncompressing the compressed text on the fly
as you process it is likely to be significantly faster than processing
the uncompressed text directly; the cost of a main memory hit is dozens
of times the cost of a cache hit.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974

***   Contribute to the Keitai Developers' Wiki!   ***
***           http://www.keitai-dev.net/           ***
Received on Fri Jan 13 09:47:00 2006