On Mon, 17 Jan 2005, Alex Shinn wrote:
> At Mon, 17 Jan 2005 14:57:59 +0900 (JST), Curt Sampson wrote:
> > This is not true, because sorts based on the numerical representation of
> > a kana can't give tokuon a lower precedence than kana following the kana
> > with tokuon. For example,$B!V$8$c$-$g$&!W(B sorts before $B!V$7$c$/!W(Bin my
> > dictionary, but with a sort based on character codes, $B$8(B (0x3058) comes
> > after $B$7(B (0x3057), and so $B$8$c$-$g$&(B would sort after even $B!V$7$s$L!W(B.
> Oops, sorry, don't mind me I was asleep when I replied :(
I have made the exact same mistake on this list in the past.
> I think for hiragana only your algorithm works.
Right. But you could translate katakana in the same way, if you wanted,
with a little tweak or two to deal with elongation marks and so on, and
maybe adding a fourth digit if you really care to sort katakana after
hiragana when the words are exactly the same.
> Including kanji, katakana and romaji the JIS standard includes 5
> collation levels - you can see an open source implementation of the
> full collation in Perl's Lingua::JA::Sort::JIS:
Ah, right. That was actually linked from my page, except due to an HTML
error it was hard to see. I didn't really understand the algorithm it
was using, though.
Curt Sampson <cjs_at_cynic.net> +81 90 7737 2974
*** Contribute to the Keitai Developers' Wiki! ***
*** http://www.keitai-dev.net/wiki/ ***
Received on Mon Jan 17 10:26:55 2005