(keitai-l) Re: Supported Character Sets for I-mode

From: Nick May <nick_at_kyushu.com>
Date: 01/12/06
Message-Id: <C176D08A-62B8-4FD1-A679-C432BA27D576@kyushu.com>
On 12 Jan 2006, at 16:15, Curt Sampson wrote:

> Unicode advocates are just asking
> everybody using EUC-JP, Shift-JIS and ISO-2022-JP to replace those
> with a Unicode encoding on your public face, which loses you little to
> nothing,


Leaving aside markup (1byte, even under UTF-8) / images/etc, it loses  
you:

1) 33% of the capacity of your pipe (which means a lower peak  
capacity, too...)

2) 33% of your server's peak capacity to handle requests (alright -  
this is worst case scenario assuming no compression and assuming that  
the time for the server to deal with a request is directly related to  
the size of the file it outputs - which is almost certainly untrue.  
But still, the smaller the file, the faster the webserver can be rid  
of it and on to the next request...)

3) 33% lower bandwidth costs.

4) 33% decrease in page load time (alright - another worst case  
scenario, for multiple reasons too obvious to mention - but you will  
probably see some change just because the file to be rendered is  
smaller.)

>  and gives you a lot of gains.

.... such as....

1a) the ability to serve in the same encoding as your database  
uses...  (but you could just use EUCJP)

2a) .... miscellaneous domain specific gains that may be relevant in  
that context.


Imagine an ultra high-volume text-based site like Slashdot were to be  
run in Japanese. Peak capacity and bandwidth costs are VERY relevant  
concerns. What encoding would be sensible for it to use to serve to  
ordinary browsers? Given  1,2 and 3 above, EUCJP fits the bill FAR  
better than UTF-8.

Changing a Japanese site from UTF-8 to eucjp is probably best  
conceived of as "optimization"... ;-)

Ah - brave new world!

I am sure there are people who will say "bandwidth is cheap, servers  
are cheap - just throw a bigger pipe at it, and gruntier boxen." I  
bet they all drive SUV's when they take the kids to school, too...  
Ugh! - I shudder gently in their general direction. Go see your  
accountant and see whether THEY buy into an encoding that decreases  
your capacity and increases your costs.

On 12 Jan 2006, at 16:15, Curt Sampson wrote:

>>  It doesn't matter
>> whether we convert at the gateway or not - the fact is that SJIS/
>> EUCJP is better suited to lowish bandwidth environments.
>
> Ah, on to real keitai stuff now!
>
> If a lower-bandwidth environment is the problem, I don't understand  
> why
> you feel conversion at the gateway doesn't matter. So long as  
> what's on
> the low-bandwith network is suited to the low-bandwith network, why  
> does
> having low-bandwidth-suitable material on the high-bandwidth  
> network (or
> not having it there) change anything?

It's unclear I agree. The point I was trying to make is that if the  
encoding has to be converted to something appropriate for that  
environment before it reaches the handset,  then it is probably NOT  
in-appropriate to serve it in that encoding directly from the server  
(and, incidentally - benefit from 1,2 and 3 above) unless one has a  
very good reason not to. (Which one may well have.) Apart from that,  
control freak that I am, I prefer things not to change encoding  
between leaving my server and hitting the handset. Makes testing a  
keitai-site in a browser even more iffy, for one thing, as well as  
"will this data fill the phone cache" type calculations....

>   they already accept UTF-8 and convert it
> appropriately. They just don't do it for web content.


Possible reasons (pure speculation)

1) They are NOT looking for ways to increase load on their servers  
for a "benefit" that few users demand. (I assume that user demand to  
be able to read unicode encoded email is high)
2) They may also have a plan to move all handsets to unicode (in  
which case gateway conversion is going backwards)
3) politics, bloody-mindedness.
4) email is less time sensitive (I refer to latency) than browsing.  
So they have more time to mess around with it.
5) There may be a "gotcha" of which we know nothing

Nick
Received on Thu Jan 12 16:19:49 2006