(keitai-l) Re: Supported Character Sets for I-mode

From: Nick May <nick_at_kyushu.com>
Date: 01/13/06
Message-Id: <9F38B71C-39C6-49CC-8CA8-23B0EA47CE06@kyushu.com>
On 13 Jan 2006, at 16:46, Curt Sampson wrote:

> I should mention, as I forgot to earlier, that these were compressed
> with gzip at the default compression level. Just for a quick  
> comparison,
> if anybody's curious, bzip2 -9 gives:

Ah good!

So - a workaround to reduce the "UTF-8 tax" is by using a slow and  
resource intensive compression scheme like bzip2 -9 compression. That  
MAY be appropriate in some situations - but - note - is getting us  
into a  tradeoff with cpu-burden at the SERVER end. Very relevant if  
one's server is already stretched.

  Is there a mod_bzip yet? Or does one have to do it in one's output  
layer.... I note that bzip support has to be compiled in to php  
especially.

> I think that this pretty much explodes any arguments about UTF-8  
> versus
> EUC-JP if your main concern is data size; what do you do in terms of
> compression makes much, much more difference

Actually, what your figures suggest is that if you wish to avoid the  
7 to 8% UTF-8 tax and cut it so something smaller, you HAVE TO use a  
cpu intensive compression like  bzip2 -9 rather than the standard  
gzip. (Which rules out all those older browser which can't handle .bz  
files.

Worth knowing, certainly - but hardly a glowing endorsement of UTF-8.


Nick
Received on Fri Jan 13 10:20:46 2006