(keitai-l) Re: Removing "emoji" from a string...

From: J. David Beutel <jdb_at_getsu.com>
Date: 06/26/03
Message-ID: <Pine.LNX.4.44.0306261504560.12218-100000@tokimi.getsu.com>
On Thu, 26 Jun 2003, Christian Anderson wrote:

> I have found a few components that will detect what charset the string is,
> which is kinda there, but it still doesnt just keep the English or Japanese
> while trashing the emojis.  What I want to do is just check the string for
> the existence of the emoji, and if they are there to get rid of them.
> 
> Does anyone have a list of what those little characters show up as on a PC?
> I tried emailing them to my pc, but they just show up as obscure kanji,
> dots, and blank spaces...
> 
> Anyone?

I don't know exactly.  But for the emoji not in a standard character set,
the provider must be using a custom character set, and perhaps a custom
character encoding.  Hopefully each provider documents their own
customization.  But anyone should be able to figure it out by emailing all
the emoji from every provider to their PC and studying the character codes
in those emails.  Hopefully someone has already done so.  Also, hopefully 
the providers are using encodings that include user defined characters, 
and assigning their emoji to those, instead of making up their own 
encodings.

Note that the provider's email gateway could convert the emoji on their 
way out, depending on their destination.  For example, ezweb could convert 
emoji sent to a docomo.ne.jp address into the closest docomo emoji.  (I 
don't know if they do; I'm just saying they could.)

What (language) are you using to handle the email?  The Perl module 
Unicode::Japanese looks like it can handle some emoji, but I haven't tried 
it myself.  See: 
http://tech.ymirlink.co.jp/perl/cpan/Unicode-Japanese-0.18/Japanese.html

11011011
Received on Thu Jun 26 08:53:31 2003