|
|
|
3w....Learning.tutorials
>
ROBOTS
>
|
|
|
Learning ASP , SQL , HTML , XML .... Q&A ... English ..Memo board
www robot:
Easy to say "robots (bots) are all internet super spies"
They can copy your websites information without notice
you(or webmaster).
Robot: a kind of soft program which deigned
to traverse (automatic download) the internet
Web hypertext structure by retrieving a
document, and recursively retrieving all
documents that are referenced.
Because robots is reading hypertext ;
therefore, all robots can be recognized
by agent, server, ip.....
But you still can not reject all robots,
as log as your website is design of
hypertext structure
because there are to many bots...
You shall be aware::
(Do not count on robots.txt)
It means; you shall not use robots.txt
to disallow bots to read your TOP SECRET PAGE
Your TOP SECRET PAGE shall have
guard or security to protect but
not only use robots disallow to avoid readers.
Ezer will consider this is a super big hole
for internet world.
The other way to think; it is also easy
to create a robot which only catch
all disallow links to generate a search engine
you may call it "disallow seach zone"
Attention: Not all the bots follow
the same rule to read or to run robots.txt
However, general information you can find
at http://www.robotstxt.org/wc/robots.html
and http://www.w3.org/TR/html4/struct/links.html
|
|
| .. |
| ... |
| ... |
|
|