(keitai-l) Re: determining what is an i-mode page

From: Craig Dunn <craig.dunn_at_conceptdevelopment.net>
Date: 02/09/01
Message-ID: <OGEGIKAMGPHPOJLMMLMLMEJJCAAA.craig.dunn@conceptdevelopment.net>
--- could we really come up with a
--- standard that others in the community would follow?

who better than a group of concerned individuals such as those represented
on this list?

the 'Robot Exclusion Protocol'
http://info.webcrawler.com/mak/projects/robots/norobots.html
and subsequent META tag addition were (i *think*) defined by just such a
group. Possibly involvement from other search engines would be good -- but
then there gooooes goooogles advantage, and they seem keenest to get it
right. BTW the Robots mailing list might be an interesting place to post a
question on any proposals... if google doesn't mind?? Most of the stuff on
there just seems to *assume* the content is HTML.

my 2\ on META tags:

* The existing ROBOTS.TXT spec allows for rules by USER-AGENT. So if Google
tells us what they're sending, we could add
	User-agent: Docomo/Google_ROBOT	# or whatever
	Disallow: /'dirs-pages that aren't imode'	# multiple lines OK
which would narrow their search dramatically AND fit within existing and
accepted crawling rules. Note this assumes Google is crawling with a
different USER-AGENT for their imode catalog than they do for their web
catalog.
This works for HDML, WML, whatever IF you divide content by directory -- but
not if all your pages are USER-AGENT sniffing and presenting custom
presentation.

* if you want a META tag and you're doing major USER-AGENT checking already,
just insert the META when ROBOT is in the USER-AGENT string. typically
robots don't like pages being 'customised' for them, but in this case it'll
save all your *other* users from a pointless 20 or so bytes of download per
page.

* how about a 'new' ROBOTS_CONTENT.TXT file standard, which is kinda like
USER-AGENT sniffing but not dynamic
	User-agent: *
	Content: imode/1.0
	Visit: /i/
			# whitespace required to distinguish entries
	Content: imode/2.0
	Visit: /i/

	Content: hdml
	Visit: /j/
Microsoft created the 'favicon.ico' "standard" just so you could have a neat
icon on your address bar in IE. if Googles prepared to query our new file,
what overhead does it add?

just thinking out loud.

cd


[ Did you check the archives?   http://www.appelsiini.net/keitai-l/ ]
Received on Fri Feb 9 08:50:08 2001