Results 1 to 17 of 17
  1. #1
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    sitemap.pl Google Sitemap Generator
    We are please to announce the release of sitemap.pl Google Sitemap Generator script.

    http://www.c3scripts.com/sitemap/

    Dynamically generates Google Sitemap sitemap.xml and sitemap.txt files by automatically scanning your webserver's hard drive for files.

    Note: Intended for use with real files that are physically on located on your webserver. Does not recognize virtual files generated via .htaccess (mod_rewrite).

    Add/Delete HTML files on your website and your sitemap will reflect those changes. No more need to manually generate a Google Sitemap. It's all automatic with sitemap.pl. Set it up and forget about it.

    Lifetime license: $14.95 as shareware -- try 100% fully functional version before you buy.

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  2. #2
    No Longer Banned!
    Join Date
    January 19th, 2008
    Location
    Wilmer, Texas 75172
    Posts
    939
    I feel what you just said about "Your WEB SERVR". But what if I pay for YAHOOO ! hosting ?

    Will it still work ?

    Steve
    DreamLinux.net | Registered Linux User 453976 | PM me to view our sites. It's a Google thing.

  3. #3
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    Your website needs to support cgi-bin and .htaccess (mod_rewrite) for sitemap.pl to work.

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  4. #4
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    I'm looking at this, it looks extremely useful and at a good price

  5. #5
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Having some problems, getting only a 400 response, I filled out a trouble ticket.
    EDIT:
    It's working now, I see what took so long. Ir was also mapping my subdomains. Need a little more information on the "Optional Configuration file" that there is no hint on how to format. I read how it is and I looked at the skip file but there is no info on the format of the exclusions file. help?

  6. #6
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    Note that sitemap.pl will display its output only if there is not an actual /sitemap.txt (or /sitemap.xml) file. It appears that your website has an actual .xml file already (DOMAIN.com/sitemap.xml) that was generated by some other sitemap generator software. If you want sitemap.pl to dynamically generate the /sitemap.xml file then you should remove your current /sitemap.xml file.

    For the format of the exclusions file, see:
    http://www.c3scripts.com/sitemap/doc...#configuration

    The exlusions file is a text file that you can create with a plain text editor such as Windows Notepad.

    Put one or more exclusions on each line. Seperate each exclusion with a space or a tab or a newline/return. From # to end of line is ignored as comments.

    To exclude all files with a certain filename ending (e.g.: .zip), specify that ending (e.g.: .zip); the dot is necessary.

    To exclude a directory and everything in it, specify /DIR/ (e.g.: /downloads/). Supports specifying subdirectories too (e.g.: /downloads/plugins/).

    To exclude a file, specify the file (e.g.: /downloads/plugins/email.zip).

    Wildcards are supported. Character * matches zero or more occurrences of any character. Character ? matches zero or more occurrences of any character EXCEPT for the / directory separator character.

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  7. #7
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    I'll delete my sitemap and try again. I did create the sitemap.txt file following the instructions in the installation guide in the sitemap folder and generated a new DOMAIN.COM/sitemap.txt and it still is spidering all subdomains. Don't suppose there is any way to make it follow the rules in robots.txt?

    Thank you for the prompt response, with everyone gone to Boston I was expecting a lo0Oong wait.

  8. #8
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    I deleted the existing sitemap and the one generated by the script. I reuploaded the sitemap.txt file with the new rules and then went again to DOMAIN.COM/sitemap.txt. It overwrote my rules with an urllist following only the rules in the script and generated a new sitemap.xml (only in the sitemap folder, not where I had deleted the existing sitemap.xml) but with all the subdomains listed again. I had to put back my old sitemap.xml file and delete those generated by the script. The script does not use the rules I put in and I followed the format of the ?skip file. (without the <html> etc. tags. If it generates a sitemap.txt file whenever I visit the DOMAIN.COM/sitemap.txt page it is going to overwrite my rules every time, isn't it? If it is going to generate a sitemap.txt file shouldn't the rules file have another name?

  9. #9
    mega crap martyogelvie's Avatar
    Join Date
    January 18th, 2005
    Location
    Atlanta
    Posts
    608
    what about dynamic pages..?
    detailspage.asp?xxxx
    I have about 30 k of these..

  10. #10
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    If we can get clear information on how to write our own exclusion rules then I'm sure you would have no problem. In the instructions it tells you to make a .txt file called sitemap.txt with a list of your exclusions and put it in the same folder with the .pl script. When you run the script it generates a new file called sitemap.txt that overwrites your rules (and ignores the rules you put in) I'm certain there is a way to do it, I just don't know what it is. It does appear to ignore all dynamically generated pages right off the shelf. Since I don't use .asp I cannot tell you for sure.
    This is a terrific asset if we can get some configuration help.

  11. #11
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    Save the exclusions file as cgi-bin/sitemap/sitemap-skip.txt

    I assume that subdomans are in subdirectories. So exclude those subdirectories. For example, if mybooks.DOMAIN.com is actually the subdirectory /mybooks/ then add /mybooks/ to your exclusions file. Do that for all domains you want to skip.

    sitemap.pl saves the latest sitemaps at cgi-bin/sitemap/sitemap.txt and cgi-bin/sitemap/sitemap.xml. When you (or a spider) access /sitemap.txt (or /sitemap.xml), the .htaccess file causes cgi-bin/sitemap/sitemap.pl to be run *IF* the request file does not exist (that's why I asked you to delete /sitemap.xml in my previous post). The first thing that sitemap.pl does is see if cgi-bin/sitemap/sitemap.txt (and .xml) exists. If they are less than one hour old, they are output. Otherwise, sitemap.pl scans your site's directory for files, saves the output, then sends it out. Because of the cached sitemaps, subsequent requests within one hour will be processed extremely quickly.

    The sitemap.pl script only works with actual files stored on the hard drive (on my server it scans at about 500 files per second). It does not work with virtual files since it scans the hard drive, it does not do http accesses (which is much slower). Note that our DySE scripts automatically generate Google Sitemaps so sitemap.pl is not required for DySE sites. A future version of sitemap.pl will allow you to specify URL's that you want included, but sitemap.pl will not spider from there, it will only include those URL's that you specify. That feature would at least enable you to specify the root of your virtual directory, and then it would be up to Google to spider from there (sitemaps don't have to specify every file on your website; Google will spider beyond it).

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  12. #12
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Thank you, Thank you! I believe that will resolve the problem. Where I went wrong was this line in the instructions:
    2. Configuration File

    An optional configuration file sitemap.txt located in the same directory where sitemap.pl is located can be used to tell sitemap.pl what files to exclude from the sitemap.

    The sitemap.pl script has a built-in list of common exclusions (e.g.: skip all .gif files). Thus you only need to create a sitemap.txt only if sitemap.pl is listing files that you do not want to list in the sitemap.
    I truly appreciate your timely help, I'm off to try again

  13. #13
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Thumbs up
    Bingo!! It works very well now!
    It does not include the AvantLink shops which are dynamic, but I believe if I rewrite the sitemap-skip.txt file I can get thim to be included. It excluded all excluded directories and listed only the files I wanted to have listed

    OK! Now let me find that link to go and pay..
    Done. Another Happy Customer!
    Attached Images Attached Images
    Last edited by 2busy; August 12th, 2008 at 04:06 PM.

  14. #14
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    The sitemap.pl documentation was updated recently.

    It erroneously stated to save the exclusions file to:
    cgi-bin/sitemap/sitemap.txt << THIS IS WRONG

    The documentation now says to save the file to:
    cgi-bin/sitemap/sitemap-skip.txt << THIS IS CORRECT

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  15. #15
    Member
    Join Date
    June 21st, 2007
    Posts
    57
    Do you need a google xml site map if your site is already indexing properly?
    My site http://dealking.com seems to be fully indexed. Can a google site map help? Can it hurt?

  16. #16
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    Google.com states: "In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."

    Google's Webmaster Help Center:
    - About Sitemaps
    - Working with Sitemaps

    Your website already has a sitemap.xml file (created by some other program) but I can't tell how old it is. Note that my sitemap.pl script scans the website's hard drive to find files (discovers about 500 pages per second), it does not spider the website from the outside. Thus URL's with parameters will not be included (e.g.: /search.php?category=123) and dynamic webpages will not be included either.

    If your website consists mainly of real .html (or .php) files located on the hard drive, then sitemap.pl is a good Google Sitemap generator for you. It runs very fast, and you only need to set it up once and you can then forget about it. Google will use the sitemap as a guide to what to spider. Google does not guarantee that everything will be indexed. Also, Google will spider beyond the sitemap, that is, if Google finds a link to a page that does not appear in your sitemap, Google will spider that page, assuming that page is not blocked via your robots.txt file. With a Google sitemap, you're basically telling the Google spider, "I'll save you some time having to spider my website to find all my webpages; here's a list of all my pages and I'll even tell you the relative priority of each page and how often I typically update each page so you know how often you should revisit."

    Bottom line is that Google Sitemaps cannot hurt you.

    BTW, even if you do not use a Google Sitemap (there's no reason why you shouldn't) then you should at least submit your website to Google Webmaster Tools so you can gain access to important indexing information about your website.

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  17. #17
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    The latest beta version of the sitemap.pl Google Sitemap Generator script now generates a log file showing exactly what was included/excluded and why. The sitemap-log.txt is write to the same directory where sitemap.pl is located. The log is not visible via your web browser; so, unless you have ssh access, FTP the log and open it locally.

    The latest beta is available in the sitemap.pl download directory.
    Information about sitemap.pl is available at the sitemap.pl information page.

    (as of this post, the beta is v10.03.09-beta; so that version or higher has the log feature)

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano
    Affiliate Tools: Datafeed Merge

  18. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Sitemap Generator Tool?
    By acpd in forum Programming / Datafeeds / Tools
    Replies: 19
    Last Post: July 25th, 2010, 10:56 AM
  2. Free Auto XML SiteMap Generator.....
    By Steve Williams in forum Marketing Resources & Power Tools
    Replies: 6
    Last Post: August 8th, 2008, 02:55 PM
  3. Beta testers for Google sitemap generator script
    By cusimano in forum Cusimano.com Scripts
    Replies: 0
    Last Post: May 5th, 2008, 10:34 PM
  4. Betty and Google Sitemap Verification
    By John Powell in forum Cusimano.com Scripts
    Replies: 1
    Last Post: April 29th, 2006, 08:59 PM
  5. Google sitemap changes
    By infoTim in forum Search Engine Optimization
    Replies: 6
    Last Post: November 21st, 2005, 02:33 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •