(keitai-l) Re: Kanji, Hanzi and Unicode

From: Benjamin Kowarsch <benjk_at_mac.com> Date: 06/20/02 Message-Id: <D2695496-8431-11D6-9648-003065FB21DC@mac.com>

On Thursday, June 20, 2002, at 05:24 , James Santagata wrote:

> I'm a little confused here how Unicode increases the number
> of characters that need to be encoded. My understanding is
> that Unicode only encodes characters, while a character's
> physical or visual representation is provided by glyphs
> whose delivery is provided by the fonts one selects.

Depends on how you interpret the meaning of "character". What I meant to 
say was that the number of graphical representations that need to be 
encoded or dealt with increases.

On my system though, Qi, Ki, qi all result in different codes.

Anyway, if it says 20000 characters are covered by the Unicode standard 
so far, then the question is, does that mean 20000 graphical 
representations of characters or does it mean actual characters ?

Clearly, where different graphical representations are represented by 
the same code, all the characters that are "ancient versions" of 
existing characters don't really need to be explicitly covered. Where 
they are represented by different codes, they would need to be covered.

It would seem that part of the work involved in encoding is to decide 
which characters are considered the same and which are considered to be 
stand-alone. A task that may at least in some cases turn out to be 
difficult because scholars may have different opinions.

> And that under Han Unification of CJK, the required number of
> encoded characters is actually greatly reduced, whereby you
> have one code point for "ki", no matter if that "ki" character
> is written differently in Kanji (Japanese) Hanja (Korean) or
> Hanzi (Simplified or Traditional Chinese).
>
> And then your desired glyph is delivered by the font you select.

That is my principle understanding too, but it seems to me that it is 
not always clear what constitutes "the same character". For example, on 
my system Qi, Ki and qi produce different codes. Thus, if I have a text 
with the character Qi in it, the character does not change from its 
Traditional representation to a Japanese representation when I change it 
from a Chinese to a Japanese font. Although, I see how this could be 
achieved even if the codes are different.

Do you have the three writing systems installed on your system ? I'd be 
interested to learn if it is any different on yours ...

kind regards
benjamin