(keitai-l) Re: Supported Character Sets for I-mode

From: Nick May <nick_at_kyushu.com> Date: 01/13/06 Message-Id: <AE0081CB-EB48-4213-9D3F-16017F28EB59@kyushu.com>

I want to kills this compression thing once and for all. Too many of  
us (me included) are making statements for which we do not have hard  
evidence...
  Before I go doing any testing, are people agreed that this (below)  
will let me test post compression lengths for various encoding  
formats, given an input file of jpfile.txt? I don't want to mess  
around with something then have to field criticisms later....

Note: - I don't know if php can do UTF-16. So that bit may not work...

It is knocked up, verbose code designed for readabily - I have not  
tested it yet.

Is this ok test code for php?

Nick

// grabs test file jpfile.txt
// determines current encoding
// converts from that to a specified encoding
// gzips converted text (including headers...)
// gets length of variables containing the gzipped
// repeats for various encodings

// TO BE DONE: echo it all...

// put file content in variable
$str = implode("", file("jpfile.txt"));

// get encoding from list of possibilites.
$encoding= mb_detect_encoding($str,  
"ASCII,JIS,UTF-8,UTF-16,ISO-8859-1,EUC-JP,SJIS");

// convert to sjis
$str_sjis = mb_convert_encoding($str, "SJIS" , $encoding);

// gzip
$str_sjis_gz = gzencode($str_sjis, 9);

//get length of gzip file
$str_len_sjis_gz = strlen($str_sjis_gz);

// rinse, repeat.

$str_eucjp = mb_convert_encoding($str, "EUC-JP" , $encoding);
$str_eucjp_gz = gzencode($str_eucjp, 9);
$str_len_eucjp_gz = strlen($str_eucjp_gz);

$str_utf8 = mb_convert_encoding($str, "UTF-8" , $encoding);
$str_utf8_gz = gzencode($str_utf8, 9);
$str_len_utf8_gz = strlen($str_utf8_gz);

$str_utf16 = mb_convert_encoding($str, "UTF-16" , $encoding);
$str_utf16_gz = gzencode($str_utf16, 9);
$str_len_utf16_gz = strlen($str_utf16_gz);

// echo it all...