(keitai-l) Re: handwriting as Japanese input method for the keitai

From: Michael Turner <leap_at_gol.com>
Date: 01/04/02
Message-ID: <[email protected]>
Now that mainstream keitai (and peripherals thereof) can take digital
photos, there's a potential new handwriting input modality: write something
on paper, take digital photos of the page, and pass the images to an OCR

Don't everybody start laughing at once.

You'd probably need to oversample quite a lot, combining the images to
simulate adequate resolution.  You know, like how the Hubble telescope was
hacked to handle that dent in its mirror?  (OK, OK.  But it could be done.)

(This could be kinda cool even without the OCR, come to think of
it--neanderthals like me, who keep a handwritten address book, could archive
every scribbled contact-list change on the spot, with a keitai button press,
and could stop worrying about losing everybody's address and phone number
like I seem to do every six months.)

Even with adequate resolution, though, you'd still have the handwriting OCR
problem.  This is about as hard as speech recognition, even if it doesn't
get as much attention.

Both speech recognition and handwriting OCR suffer from a similar
user-acceptance problem: if you're not doing it really, REALLY well, it's
not worth doing it at all.  98% correct doesn't cut it--at that level of
accuracy, you end up spending as much or more effort supplying the 2%
corrections as you would if you just typed the text straight in manually.
You need more like 99.5%+, and that is a tall order, given how hard it can
be to read your own handwriting, or make out what you said on tape.

Maybe there's some hope, though, in mobile phones being speech input devices
as well--a dumb speech recognizer taking noisy audio input might correct a
lot of the mistakes of a dumb handwriting OCR program scanning smudgy pages;
and vice versa.  Especially when using a pronunciation dictionary for
narrowing the choices.

This is not a new idea: Some years back, when I fancied myself an OCR
researcher, I was casting about for the various algorithmic components for
such a system, and I ran across some work done at IBM Watson on this very
approach.  But exactly.

And I thought I was so smart there for a few days.   :-(

The IBM researchers recorded test subjects speaking from texts, and put the
recordings through what was probably ViaVoice in embryo.  They scanned the
same linguistic corpus written out by hand into a research-toy handwriting
OCR program.  Then they combined the input using a fairly rudimentary
sequence analysis of the candidate recognition choices from both input
sources.  (Rudimentary compared to the human genome project, anyway.)

Overall recognition rates soared.  They were almost ... acceptable, even.

It's not hard to see how this works.

For example: We all say "innurnet", mostly, in speech.  A not-too-bright
speech recognizer might hear this as "inner net" (points for trying) and
guess wildly that it could also be "internet" (please please) or "in a net"
(oh no not again).

An infuriatingly dense OCR program might at least see one word here, rather
than three; a slightly less moronic one might suggest candidates like
"interned", "intoned", "internet" or "intranet", by the scrawled look of it.

And a cretinous sequence analyzer with a freshly-laundered drool-bib,
consulting a dictionary, might pronounce "internet" the winner, even if the
separate sources didn't rank this interpretation so highly.  It's a point of
overlap among the candidates, and anyway,"interned" and "intoned" don't
sound much like "innurnet" when you look up their pronunciations, so they're
easy to throw out.  And "internet" is closer to "innurnet" than is
"intranet."  (Most of the time.  This one gets me in trouble.)

Thus do three idiot children end up with a higher combined IQ, IF they have
a dictionary with pronunciations.  Sometimes you get lucky.

This scheme has its limits in English, it turns out.  The lower-case vowel
letters "aeou" exhibit a wide range of colorful pronunciations, most of them
indistinguishable from "uh".  When written, they are easily confused with
each other (and with "c", "r", "n", etc.) even if you're not an OCR program.
(I do this all the time, when I'm not busy drooling.)  A mutter here, a
scribble there ... you can imagine it.  Soon you'd be back to proof-reading
at the keyboard, backspacing through the audio, muttering threats and
scribbling notes like "FIRE the people who signed off on this hellish thing!
No, TORTURE them first!"

Japanese undoubtedly has its own problems, but it might be better suited in
both its sound system and its writing system to this
hit-'em-twice-from-different-angles approach.  One could make a case.

And the relevant recognition technologies have improved since the original
studies anyway.  After all, ViaVoice is shrinkwrap now.  (If we don't have
ViaScribble it's because the obvious market for it--the legal
profession--mainly produces impenetrable drivel anyway.)

Lack of horsepower on the handset side need not be a bottleneck.  It could
be done server-side for the most part.  The world is, after all, awash in
server capacity at this point, what with C&W buying Exodus (or was it the
remains of PSInet?  I can't keep up.)

In the end, though, even with excellent combined recognition rates, you have
those market-definition and user-acceptance problems.  Who's going to use
this?  When?  Where?  How often?  For what purposes?  Isn't it a lot of
trouble to write something out and then speak it?  Or, what's worse, to
speak something and then listen to your own droning voice played back,
writing out what you said, hitting the "backspace" button a zillion times?

The saving grace of on-line character recognition in handhelds is that it's
a bit like talking on the phone: very interactive and relatively economical
in its use of public space (even if it might not be compact enough and
convenient enough to be very popular for mobile phones--I have my doubts.)
Writing by hand, on paper, off-line, is a process that tends to sprawl,
spatially and temporally.  It's not truly mobile--it's hard to do it
standing up, for example.  Pen input also has a social precedent: taking
notes on paper notepads.  Speech input doesn't; it tends to make people

So maybe this approach I outline fits in somewhere, in the mainstream; it's
certainly nice that you could do it now, probably, with
off-the-shelf-technology, but ...

Thumbtyping is a nice cultural match to the modern world--especially the
modern Japanese world.  I'm always struck, when I watch thumbtyping, by how
private, and laconic, and physically ... tidy and unobtrusive it is.  Rather
like the Japanese themselves.  I think thumbtyping is going to be hard to
replace.  Or even to supplement; there might not be much of a mobile market
for the kinds of input that thumbtyping doesn't accommodate very well.

Anyway, what we really need are PPAs: Personal Protoplasm Assistants.  I
think they used to call them "secretaries."

-michael turner

----- Original Message -----
From: "Paul Lester" <paul.lester@lincmedia.co.jp>
To: <keitai-l@appelsiini.net>
Sent: Wednesday, January 02, 2002 3:35 PM
Subject: (keitai-l) Re: handwriting as Japanese input method for the keitai

>     My guess is that :
> About 2 years ago, someone in Japan
> starting selling a touch screen phone.  It flopped completely.
> I think its because of that consumer response that no Japanese
> company has tried anything similar with keitais....
Received on Sat Jan 5 13:45:49 2002