Alex Shinn wrote:
> Of course, we don't have ideal compressors. If you take a good
> compressor like bzip2 and run it on pure Japanese text (I took the
> complete text of Rashomon) you get
> bzip2 compressed EUC-JP: 5575 bytes
> bzip2 compressed UTF-8: 5752 bytes
> gzip compressed EUC-JP: 14918 bytes
> gzip compressed UTF-8: 15931 bytes
> about 6%, which was higher than I expected but nonetheless not a
> significant cause for concern. To me it would be well worth the cost
> of being able to use any language at all on a website.
>> UTF-8 may well have many advantages over euc-jp and sjis. But its
>> proponents do themselves, and it, a disservice to pretend that moving
>> to it does not involve trade-offs.
> I never said this, and certainly didn't intend to give such and
> impression, sorry if I did. I was just addressing one aspect of your
> concern, that of the excess bandwidth requirements of UTF-8, pointing
> out that in a web application where bandwidth was a concern the
> difference between encodings will be minimal. There are other
> applications where the size of UTF-8 may be more of a concern.
I am a late-comer to this party - but I'm wondering about resources also
on the "rendering" side - how much data memory, font space and
processing ability does it take to process EUC-JP, SJIS or UTF-8/16 on a
device like a mobile phone? I know this is moving away from the
discussion about bandwidth concerns and being able to represent all the
characters in a given language. But, from the embedded developer's
perspective, I'd expect that these are reasonable issues - one of the
reasons given for the proposition of the TRON Code was to have a system
that could process Japanese, etc. without requiring very large buffers
and data stores.
As with everything in the world, I'm sure that each of the above have
specific use cases where they shine.
Received on Fri Jan 13 09:16:41 2006