(keitai-l) Re: Supported Character Sets for I-mode

From: Nick May <nick_at_kyushu.com>
Date: 01/13/06
Message-Id: <A1CDDE68-02B2-4A6E-BBAC-797BA095A0CE@kyushu.com>
On 13 Jan 2006, at 10:50, Alex Shinn wrote:

> High-bandwidth sites like Slashdot use mod_gzip, compressing on the
> fly.

Yes sure - as do lots of sites. Or something similar. I believe I  
mentioned compression.


>   Considering the verbosity of HTML this is a win no matter what
> encoding you use

Indeed.  And of course markup dilutes content.

> , and basically eliminates any size differences in the
> encodings.

Are you claiming that 3 byte UTF-8 is SO much more compressible than  
2 byte eucjp  that it is sufficient to make up the difference?  That  
would indeed be interesting and would remove a major issue with UTF-8.

Or are you referring to non-zero, non-trivial values of "basically  
eliminates"? In which case you are playing with words rather than  
addressing the point.

In fact, the less one uses plain old html, and the more one moves to  
stylesheets  for layout, the greater the percentage of a given page  
served tends to be content (for all but the first page, when the  
stylesheet is served and cached), rather INCREASING the hit from  
using 3byte UTF-8 over 2byte EUC-JP.

It is one thing to assign different values to the various elements in  
a tradeoff, (bandwidth, peak capacity, page load times, encoding  
etc ) but quite another to deny that those elements in the tradeoff  
exist at all. Which is what you seem to be doing. (But then the  
essence of your claim lurks somewhere beneath the murky semantic  
surface of the word "basically"!)

>
> In fact, http://slashdot.jp/ uses UTF-8 as its encoding.

Sure. I looked before I posted. More fool them, unless they have a  
good reason to**. I am interested in what it is rational to do, not  
what this or that site actually does.  (In addition, they are not a  
terribly high bandwidth site, so bandwidth is far less of an issue  
for them. But the fact remains they would get the benefits noted in  
my last post if they ran it on eucjp.)

What I WAS referring to in my post (this could have been clearer, I  
grant) was a site with the vast bandwidth requirements of OUR  
slashdot -  slashdot.org, but serving Japanese.

Incidentally - on the subject of "making changes to HTML", there were  
some figures worked out for how much Slashdot.org had saved itself in  
a year by going from its old format to its new css  stylesheets. I  
can't remember them off hand, but it was quite a lot of money. All by  
changing their nice compressible, gzipped text.

Of course one should select encodings on a rational basis and choose  
one that is appropriate to the domain. That may well be UTF-8 - even  
fat boys get dates. But for certain types of constraints (and  
bottlenecks) within certain domain, it is probably rational to select  
a 2 byte encoding over a 3 byte one. Ultra high volume sites serving  
mainly text and using stylesheets, being a case in point.

UTF-8  may well have many advantages over euc-jp and sjis. But its  
proponents do themselves, and it, a disservice to pretend that moving  
to it does not involve trade-offs.

Nick


** I can think of several good reasons why they might want to, but  
these are all trade-offs against the  saving in bandwidth  that eucjp  
would buy them.
Received on Fri Jan 13 06:25:49 2006