(keitai-l) Re: determining what is an i-mode page

From: Craig Dunn <craig.dunn_at_conceptdevelopment.net> Date: 02/16/01 Message-ID: <OGEGIKAMGPHPOJLMMLMLEELACAAA.craig.dunn@conceptdevelopment.net>

lauren,

>>We'll also be looking for good start sites.  So, I'd appreciate any
>>suggestions, including the URLs of your own i-mode sites.  (Please mail me
>>directly w/ these, as I don't want to clutter the list!)

i've implemented the ROBOTS.TXT 'idea' on my site www.chiizu.com (or
http://i.chiizu.com for imode if you prefer - same site)

unfortunately our site is not very 'search' friendly -- most of the
functionality is behind a user login.

for the record, i think a parenthesised (Google) in your USER-AGENT string
is essential [i'd argue that the word ROBOT in there would be helpful too),
as it helps the site owner track robots on the site, allowing them to
monitor crawling behaviour, and alerting them to possible problems with
their ROBOTS.TXT file...

cd

-----Original Message-----
From: lauren@google.com [mailto:lauren@google.com]
Sent: Thursday, 15 February 2001 6:20 PM
To: keitai-l@appelsiini.net
Subject: (keitai-l) Re: determining what is an i-mode page

Hi again,

Thank you for all the suggestions.  We've been tossing them around, and
our plan is to take a "comprehensive" approach - robots.txt, tags, text,
etc.

We plan on having a separate i-mode crawl in which we'll use a
DoCoMo-style user agent for spidering.  (We already have separate crawls
for WML and HDML, in which we use UP/Nokia user agents w/ a parenthesized
"Googlebot..." at the end.)  [Incidentally, will the parenthesized
"Googlebot" throw anyone off?  I'll find out soon enough, I suppose.]

At first glance, however, it seems infeasible/undesirable to crawl each
potential page multiple times, using a different i-mode user agent every
time.  The final index should have just one representative copy of the
i-mode page that we can accurately search against.  However, once the user
clicks on the search result, he/she will go directly to the page & will be
served whatever customized output the site has, regardless of the specific
version we indexed.  Nick & Craig commented on this a bit... but tell me -
in general do sites redirect users to customized pages with different URLs
depending on the user agent or does the same URL have different content
depending on the user agent?  For example, if I go to site www.xxx.com/i
with, say, a P503i, will I be automatically redirected to some another
page, such as www.xxx.com/i/p503i?  This could be problematic given the
scheme I described above.

I like the robots.txt idea (User-agent: DoCoMo/*) a *lot*, but once again
- is this something that most i-mode developers already do or would have
to spearhead something?

And, yes, we do take robots.txt files seriously.  From what I've seen,
most "violations" are because the robots file didn't exist at the time of
the crawl or, if it did, the format was wrong.

We'll also be looking for good start sites.  So, I'd appreciate any
suggestions, including the URLs of your own i-mode sites.  (Please mail me
directly w/ these, as I don't want to clutter the list!)

Lauren

[ Did you check the archives?   http://www.appelsiini.net/keitai-l/ ]