(keitai-l) Re: Shift_JIS question

From: Ben Hutchings <ben.hutchings_at_roundpoint.com>
Date: 05/31/02
Message-ID: <Pine.WNT.4.43.0205311356300.2016-100000@BENWORLD.roundpoint.co.uk>
On Fri, 31 May 2002, Darren Cook wrote:

>
> > i am building a module which will allow users to merge
> > data from an ASCII source and a SHIFT-JIS source. does
> > it make logical sense to have all of the data files
> > (such as ASCII and Shift-JIS) in the same encoding
> > .... in other words, is ASCII a subset of
> > SHIFT-JIS, or vice versa?

I'll assume that you mean US-ASCII, as there are variants of this
specified in ISO 646 that are also sometimes called ASCII.

> Yes, 7-bit ASCII is a subset of Shift-JIS, so all your data files can be
> in shift-jis encoding.

No it isn't!  In US-ASCII, backslash has the code 0x5C, but in Shift-JIS
this code is used for the yen symbol.

What the item on the Python list was saying was that the second byte of a
two-byte character may take values that can also represent a character on
their own; for example, 0x5C is also valid as a second byte.  This means
that searching for characters in Shift-JIS strings requires awareness of
the multi-byte encoding; for example, in C, strchr() and strstr() will not
work correctly on Shift-JIS strings.
Received on Fri May 31 16:06:14 2002