(keitai-l) Re: SJIS encoding for "do"

From: Ben Hutchings <ben_at_decadentplace.org.uk>
Date: 04/21/03
Message-ID: <20030421144759.GM7180@bunthorne.i.decadentplace.org.uk>
On Mon, Apr 21, 2003 at 01:18:06PM +0900, Shawn wrote:
> 
>  Yes I literally get "&#12393;".
> 
> > If so, that's not SJIS.  
> Ok Thank you kindly.  That's what I needed to know
> > 
> > The ASCII string "&#12393;" should be interpreted by anything
> > interpreting HTML as the unicode character 0x3069, which is indeed
> > "do".
> 
> Which explains why it works in a browser and not inside a midp app which
> is looking for real SJIS characters.
> 
> Java stores it's strings in unicode so I am baffled at why I'd get a
> ASCII string when explicitely writing a Shift_JIS file.
> 
> Must be a java on RH8 prob that I need to solve.

I don't believe that is the problem.

HTML (and any SGML application) has a document character set which is
not necessarily the same as the character set that is used to write
it.  The document character set for HTML 4.0 and later is the UCS, aka
Unicode; for previous versions it was ISO 8859-1.  You can write an
HTML page in ASCII but use character entity references to access the
full document character set.

Now DoCoMo deviated from this with i-mode: not only must the character
encoding be Shift-JIS, but the document character set is also Shift-
JIS.  This means numeric character entity references must use Shift-
JIS numbers, which standard HTML tools won't do.  Of course there is
no need to use such references, since there are no characters that
can't be encoded directly.

If you use SGML tools instead, you can make them use the correct
document character set by writing a declaration and DTDs specifically
for i-mode HTML.  I did this at a PPoE for the purpose of validating
pages against the specification (if you can call it that!).

-- 
Ben Hutchings  |  personal web site: http://womble.decadentplace.org.uk/
Tomorrow will be cancelled due to lack of interest.
Received on Mon Apr 21 17:49:11 2003