(keitai-l) Re: Sorting Yomi

From: Curt Sampson <cjs_at_cynic.net>
Date: 01/17/05
Message-ID: <Pine.NEB.4.58.0501171447350.2767@angelic-vtfw.cvpn.cynic.net>
On Sun, 16 Jan 2005, Alex Shinn wrote:

> The order you are using is the standard JIS order (almost - in all
> cases the small form sorts before the large form, which you have right
> for A, I, U, E, and O, but not YA, YU, YO or WA).

Thanks for this correction. A second look at my denshi-jishou confirms
that it does indeed sort small ya/yu/yo before large.

> If the data is stored in the database with a JIS-based encoding (any
> of the standard Japanese encodings, plus Unicode also preserves this
> order) then PostgreSQL will sort this properly.
> ...for Hiragana you don't need to do anything special.

This is not true, because sorts based on the numerical representation of
a kana can't give tokuon a lower precedence than kana following the kana
with tokuon. For example,「じゃきょう」 sorts before 「しゃく」in my
dictionary, but with a sort based on character codes, じ (0x3058) comes
after し (0x3057), and so じゃきょう would sort after even 「しんぬ」.

> Kanji is the difficult thing to sort, which PostgreSQL can't handle
> because the characters have different pronunciations in different
> contexts and you would need full NLS to figure out the right one.

If names are involved, an NLS won't do it for Japanese. The *only* thing
that will work properly in all instances is if you store the reading as
well as the kanji.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974

***   Contribute to the Keitai Developers' Wiki!   ***
***        http://www.keitai-dev.net/wiki/         ***
Received on Mon Jan 17 07:58:06 2005