Results 1 to 3 of 3
  1. #1
    Senior Member visible soul's Avatar
    Join Date
    Sep 2003
    Location
    Corpus Christi
    Posts
    111

    Default spiders .htaccess & robots.txt

    Hello all-

    Do search engine spiders crawl directories outside of the "www" folder? Does an .htaccess file keep robots out of a directory or do I need to add directories in my root to my robots.txt?

    On a related issue. I am running a bulletin board on my VDS. I've noticed Googlebot crawling it a lot lately. Is there something additional that I need to do to protect my sql database so that robots don't index restricted information?

    Also is it possible to add a specific bulletin board category or thread to the robots.txt file so that spiders index only selected categories or threads?

    Thanks for your replies.
    -DKC-
    "Beware of all enterprises that require new clothes." -Henry David Thoreau

  2. #2
    Senior Member torrin's Avatar
    Join Date
    May 2003
    Location
    Vista, CA
    Posts
    534

    Default Re: spiders .htaccess & robots.txt

    Quote Originally Posted by visible soul
    Do search engine spiders crawl directories outside of the "www" folder?
    Yes, if it's visible to the world. Make sure your directories aren't by configuring the server correctly and using .htaccess and .htpasswd files.

    Quote Originally Posted by visible soul
    Does an .htaccess file keep robots out of a directory or do I need to add directories in my root to my robots.txt?
    Yes as long is you put appropriate passwords. As for robots.txt that will keep search engines out too. Well, it will keep the search engines out that actually pay attention to robots.txt. There are some that don't. .htaccess and .htpasswd is more secure for this.

    Quote Originally Posted by visible soul
    On a related issue. I am running a bulletin board on my VDS. I've noticed Googlebot crawling it a lot lately. Is there something additional that I need to do to protect my sql database so that robots don't index restricted information?
    I don't think the search engines will try to index a sql database, but I'd have to know more about your set up to answer this.

    Quote Originally Posted by visble soul
    Also is it possible to add a specific bulletin board category or thread to the robots.txt file so that spiders index only selected categories or threads?
    I don't know.

  3. #3
    Senior Member visible soul's Avatar
    Join Date
    Sep 2003
    Location
    Corpus Christi
    Posts
    111

    Default

    Muchos gracias torrin.

    I appreciate your input. I thought that the htaccess would prohibit all unauthorized access including spiders but I wanted to get confirmation here. This board is great.

    -DKC-
    "Beware of all enterprises that require new clothes." -Henry David Thoreau

Similar Threads

  1. virtual subdomains using .htaccess
    By ryanz in forum General Discussion
    Replies: 2
    Last Post: 03-02-2010, 12:54 AM
  2. .htaccess causes 500 IES
    By bossbn in forum General Discussion
    Replies: 5
    Last Post: 11-27-2007, 06:31 PM
  3. .htaccess - can't password protect Webalizer directory
    By extexas in forum General Discussion
    Replies: 1
    Last Post: 08-09-2005, 02:19 PM
  4. Ban malicious bots with this Perl Script
    By zestgourmet in forum CGI Scripts / Perl
    Replies: 1
    Last Post: 03-08-2005, 05:58 PM
  5. .htaccess trouble
    By foeggy in forum PHP / MySQL
    Replies: 6
    Last Post: 02-12-2004, 01:51 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •