On 13 Jan 2006, at 16:46, Curt Sampson wrote:
> I should mention, as I forgot to earlier, that these were compressed
> with gzip at the default compression level. Just for a quick  
> comparison,
> if anybody's curious, bzip2 -9 gives:
Ah good!
So - a workaround to reduce the "UTF-8 tax" is by using a slow and  
resource intensive compression scheme like bzip2 -9 compression. That  
MAY be appropriate in some situations - but - note - is getting us  
into a  tradeoff with cpu-burden at the SERVER end. Very relevant if  
one's server is already stretched.
  Is there a mod_bzip yet? Or does one have to do it in one's output  
layer.... I note that bzip support has to be compiled in to php  
especially.
> I think that this pretty much explodes any arguments about UTF-8  
> versus
> EUC-JP if your main concern is data size; what do you do in terms of
> compression makes much, much more difference
Actually, what your figures suggest is that if you wish to avoid the  
7 to 8% UTF-8 tax and cut it so something smaller, you HAVE TO use a  
cpu intensive compression like  bzip2 -9 rather than the standard  
gzip. (Which rules out all those older browser which can't handle .bz  
files.
Worth knowing, certainly - but hardly a glowing endorsement of UTF-8.
Nick
Received on Fri Jan 13 10:20:46 2006