3w study

submit to 3000 search engines
You are reading :: English:: Learning ASP SQL VB HTML code

Learning .ASP SQL HTML XML CSS JAVA Perl code study
English Chinese_Traditional Chinese_Simplified

You are reading :: English English

Learning.Topic:
HOME
XML
VBscript
SQLserver
SQL
SearchEngine
Robots
. href
. META
. Robots.txt
»Disallow
Microsoft
JAVA
InternetProtocol
HTML
Ecommerce
CSS
ASP

SiteMap

3w....Learning.tutorials > ROBOTS > robots.txt » disallow

3w learning

robots.txt >
disallow

Title:

Disallow: / disallow robots to read and search all file after the first "/" often say every thing after your domain or after your IP Disallow: It is empty after "Disallow:" then it means allow ---------- attention ----------- not all the r


Description:

Robots.txt shall be added at root, no other places ex: http://----.com/robots.txt User-Agent:* Disallow: / ( disallow all robots to search all files) ------------- attention ----------- if you want allow all search engines then suggest you : way 1: To delete robots.txt because you do not need a robots.txt way 2: To keep robots.txt BLANK so no errors fill in error log when bots come. Way 3: no robots.txt but create a customized error page Do not recommend you to use User-Agent:* Disallow: to allow all robots to read your files Because not all robots follow the same rules the better way to allow all robots to read your all sites which is Way 1: " to delete your robots.txt " But without robots.txt will create errors when bots come. Way 2: "to keep robots.txt with nothing in" so you can avoid errors when bots come. Way 3: no robots.txt but create a customized error page Big Tips: read robots line by line each time


Example Code:

http://---- .com / robots.txt ----------- disallow all (*) after (http://----.com/) User-Agent:* Disallow: / ----------- disallow all (*) after (http://----.com/folder/) User-Agent:* Disallow: /folder/ ------------------------------------- http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1 example: Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html. ------------------------------------------ User-agent: * Disallow: /cgi-bin/ Disallow: /privatedir/ Disallow: /hotnews/not4u.htm ------------------------ Allowing Googlebot block all bots but allow Googlebot User-agent: * Disallow: / User-agent: Googlebot Disallow: ------------------------ Block Googlebot entirely but allow ( Googlebot-Mobile ): crawls pages PS: ( Allow: ) this syntax may only work for Google bot only User-agent: Googlebot Disallow: / User-agent: Googlebot-Mobile Allow: or User-agent: Googlebot-Mobile Disallow: ------------------------ User-Agent: Googlebot Disallow: /folder1/ Allow: /folder1/myfile.html ------------------------ User-agent: Googlebot Disallow: /folder ------------------------ User-agent: Googlebot Disallow: /*.gif$ ------------------------ User-agent: Googlebot Disallow: /*? ------ more how to block googlebot ------ http://www.google.com/support/webmasters/bin/ answer.py?answer=40364&topic=8846 ------ more control msnbot ------ http://search.msn.com/docs/siteowner.aspx? t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm ------ more yahoo bot info ----- http://help.yahoo.com/help/us/ysearch/slurp/ ---------- some standard ------- http://www.robotstxt.org/wc/exclusion-admin.html What to put into the robots.txt file The "/robots.txt" file usually contains a record looking like this: User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/ In this example, three directories are excluded. Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/". Also, you may not have blank lines in a record, as they are used to delimit multiple records. Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif". What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples: To exclude all robots from the entire server User-agent: * Disallow: / To allow all robots complete access User-agent: * Disallow: Or create an empty "/robots.txt" file. To exclude all robots from part of the server User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /private/ To exclude a single robot User-agent: BadBot Disallow: / To allow a single robot User-agent: WebCrawler Disallow: User-agent: * Disallow: / To exclude all files except one This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory: User-agent: * Disallow: /~joe/docs/ Alternatively you can explicitly disallow all disallowed pages: User-agent: * Disallow: /~joe/private.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html


Example Result:

User-Agent:* Disallow: /


if you allow all to visit your all sites better not to keep the robots.txt like this User-Agent:* Disallow: sometimes it will confused with User-Agent:* Disallow: / sometomes may cause strange results for http:://----.com -vs- http:://----.com/


..
...
...

[ 9/7/2008 ]

Web Tools
Table Builder
Meta Builder
Mask Maker

www learning school add more scripts and tips memo
You are at >>3W.EZER.COM >> 3W.EZER.COM/ROBOTS/ROBOTS.TXT/DISALLOW.ASP>>ROBOTS
Helpful link:: SEO web tools :: Live PR | SERP checker
back to top Ezer code adding :: Questions ;email