Page 1 of 2 12 LastLast
Results 1 to 10 of 15
  1. #1
    Junior Member
    Join Date
    Jan 2009
    Posts
    16

    Default do I need a site map?

    My head is spinning about site maps and I wonder if we even need one. The issue is that on our site we have a few folders of documents that we want our Search function to access. (We're using Google Custom Search.) The documents do not have any links in or out. The only way our users will find them is by search. So can we submit the URLs of the folders to Google instead of making a site map? The contents of the folders will change, so we don't want to have to resubmit site maps all the time. And there are more than 500 pages, so the online site map generators don't work.

    The site just launched so it is too soon to know for sure that Google is not going to find them on its own.

    I'm groaning at the idea of buying, learning, and regularly using a site map generating program.

  2. #2
    Moderator wildjokerdesign's Avatar
    Join Date
    Jun 2003
    Location
    Kansas City Mo
    Posts
    5,721

    Default

    If your documents do not have links to them then there is going to be know way that Google can find them. Google indexes site by following links either on your site or from other sites that leads to files/documents on your site.

    What type of documents are these? Are the HTML files or something else? Is there a reason you don't want to have links leading to them?

    The easiest way to create a listing of files in a directory would be with a .htaccess file that you place in the directory where you want the list. Lets say you have a directory with images in it and you want a list of those image that are linked to the files themsleves. Say something like this: http://www.puppetsandstuff.com/banners/

    I created that listing by placing this in a .htaccess file that is uploaded to that directory:
    Code:
    Options +Indexes 
    IndexIgnore *.html *.php
    HeaderName /banners/HEADER
    ReadmeName /banners/README
    IndexOptions +SuppressDescription +SuppressHTMLPreamble
    Let's go through line by line of that code.

    Options +Indexes This tells Apache that you want a list of files created in the directory.
    IndexIgnore *.html *.php This says I don't want it to list files that end in .html or .php
    HeaderName /banners/HEADER This tells it that I have a file called HEADER that I want it to output before the listing.
    ReadmeName /banners/README This tells it that I have a file called README that I want it to output after the listing.
    IndexOptions +SuppressDescription +SuppressHTMLPreamble This limits the amount of information that is output with the listing since I wanted a very simple listing.

    The last three lines really just have to do with how the page looks. I wanted to make this page match a bit my main site. Your actually useing mod_autoindex to do this that is part of Apache. There is more information on what you can do with this here: http://httpd.apache.org/docs/2.0/mod/mod_autoindex.html

    What you might want to do is simply use this in your .htaccess file to start with and see if it fits your needs.
    Code:
    Options +Indexes
    Shawn
    Please remember your charity of choice: http://www.redcross.org

    Handy Links: wildjokerdesign.net | Plain Text Editors: EditPlus | Crimson

  3. #3
    Junior Member
    Join Date
    Jan 2009
    Posts
    16

    Default

    Shawn, thank you for that info. If I understand, with an .htaccess file we could allow people to see a list of all the documents in the folder, and the names would be clickable, which creates the links that Google needs in order to index our documents. (The reason the documents don't have any links is that they are contracts, still in .doc or .pdf format, not html files. There are way too many of them to convert and add links. We think they should get indexed by Google because that folder is pointed to in our custom search function.)

    The actual page is http://www.ibew1245.com/PGEcontractLibrary.html

    I have used "Google Custom Search" to point the first search box to a folder of documents, and it works. (For example, search for "hard hat" and you get 3 documents in results.)
    But the second search box is not finding any of the pdf documents in the folder I pointed it to. (I know the pdfs are searchable, and the path is correct.)
    So to use your idea, I could create an .htaccess file and upload that to the contracts folder, then add a link on my page that says "complete list of contracts" with the URL to the access file, and they would get a simple directory, like your example. This would at least give access to the documents and the list would always be current.

    What do you think?

    Kathy

  4. #4
    Moderator wildjokerdesign's Avatar
    Join Date
    Jun 2003
    Location
    Kansas City Mo
    Posts
    5,721

    Default

    Kathy,

    More then likely the reason you get the three hits on the first search box is because there are links other someplace that lead to those files. I have not really dug into the Google Custom Search feature that much but the bottom line is that in order for Google to index the files to a directory on your site the must follow links to them.

    So yes if you upload a .htaccess file to each directory that you want a listing in then create a link to that directory then you get the listing. You do not link to the actual .htaccess file just to the directory you have placed it in. http://www.ibew1245.com/ContractIndex/ a link like that would give you a listing if you had a .htaccess file in the ContractIndex directory.
    Shawn
    Please remember your charity of choice: http://www.redcross.org

    Handy Links: wildjokerdesign.net | Plain Text Editors: EditPlus | Crimson

  5. #5
    Moderator wildjokerdesign's Avatar
    Join Date
    Jun 2003
    Location
    Kansas City Mo
    Posts
    5,721

    Default

    If you wanted to place that list of links into the look of your site you could "split" the HTML code of contractindex-results.html between a HEADER and README and upload those to the same directory. You split the code right after:
    Code:
    <div id="mainContent-noSidebars" class="newsText">
    Everything after that point you would put in the README file and that and everything above it you would put in the HEADER file. Hope that makes sense to you.
    Shawn
    Please remember your charity of choice: http://www.redcross.org

    Handy Links: wildjokerdesign.net | Plain Text Editors: EditPlus | Crimson

  6. #6
    Moderator wildjokerdesign's Avatar
    Join Date
    Jun 2003
    Location
    Kansas City Mo
    Posts
    5,721

    Default

    Sorry to keep posting. I just noticed that you have a partial listing of the docs on your site. That is how Google found those files. I checked the Letters of Agreement page which seem to be the pdf files and the links on that page are invalid. In other words they return a 404 error meaning the files are not where they are supposed to be. That could be why the PDF search does not work. Double check validity of links and the paths you are using.
    Shawn
    Please remember your charity of choice: http://www.redcross.org

    Handy Links: wildjokerdesign.net | Plain Text Editors: EditPlus | Crimson

  7. #7

    Default

    Google Maps will allow you to add multiple points then print the map only. It will still create directions, but you can ignore tham and not print them.

  8. #8
    Junior Member
    Join Date
    Jan 2009
    Posts
    16

    Default

    Quote Originally Posted by wildjokerdesign View Post
    Sorry to keep posting. I just noticed that you have a partial listing of the docs on your site. That is how Google found those files. I checked the Letters of Agreement page which seem to be the pdf files and the links on that page are invalid. In other words they return a 404 error meaning the files are not where they are supposed to be. That could be why the PDF search does not work. Double check validity of links and the paths you are using.
    Shawn, thanks for spotting that page of bad links - those files were removed but they forgot about that page. That's fixed now.

    The .htaccess idea is working GREAT on my testing site (www.kifergraphics.com/TESTS/) but is not working on the real site, which would be www.ibew1245/PGE-docs/. I can tell the path is correct because I can open the header (www.ibew1245.com/PGE-docs/HEADER.html) and the icons. We tried stripping out everything in the .htaccess file except:
    Options +Indexes

    but it still doesn't work. Do you have any idea why not?

    Thanks for being so helpful. I think we are really close to getting this.

    Kathy

  9. #9
    Moderator wildjokerdesign's Avatar
    Join Date
    Jun 2003
    Location
    Kansas City Mo
    Posts
    5,721

    Default

    Is ibew1245 a secondary domain that has been added to an account? Perhaps kifergraphics.com is the main site on that account? If so you need to edit the httpd.conf file for the account. See this thread for information on that process. http://forums.westhost.com/showthread.php?t=14050
    Shawn
    Please remember your charity of choice: http://www.redcross.org

    Handy Links: wildjokerdesign.net | Plain Text Editors: EditPlus | Crimson

  10. #10
    Junior Member
    Join Date
    Jan 2009
    Posts
    16

    Default

    No, it is not a subdomain, but that was a good thought. They are a separate domain also hosted by WestHost.

    Is there something else that would block an .htaccess file?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •