Thread: sitemap.pl Google Sitemap Generator

 
Tools Search
  #1  
Old July 30th, 2008, 04:26 PM
ABW Ambassador
Join Date: January 18th, 2005
Location: Toronto, Canada
Posts: 1,369
We are please to announce the release of sitemap.pl Google Sitemap Generator script.

http://www.c3scripts.com/sitemap/

Dynamically generates Google Sitemap sitemap.xml and sitemap.txt files by automatically scanning your webserver's hard drive for files.

Note: Intended for use with real files that are physically on located on your webserver. Does not recognize virtual files generated via .htaccess (mod_rewrite).

Add/Delete HTML files on your website and your sitemap will reflect those changes. No more need to manually generate a Google Sitemap. It's all automatic with sitemap.pl. Set it up and forget about it.

Lifetime license: $14.95 as shareware -- try 100% fully functional version before you buy.

Yours truly,
Cusimano.Com Corporation
per: David Cusimano
  #2  
Old July 30th, 2008, 09:45 PM
No Longer Banned!
Join Date: January 19th, 2008
Location: Wilmer, Texas 75172
Posts: 940
I feel what you just said about "Your WEB SERVR". But what if I pay for YAHOOO ! hosting ?

Will it still work ?

Steve
__________________
DreamLinux.net | Registered Linux User 453976 | PM me to view our sites. It's a Google thing.
  #3  
Old August 1st, 2008, 05:01 PM
ABW Ambassador
Join Date: January 18th, 2005
Location: Toronto, Canada
Posts: 1,369
Your website needs to support cgi-bin and .htaccess (mod_rewrite) for sitemap.pl to work.

Yours truly,
Cusimano.Com Corporation
per: David Cusimano
Join ABW to remove this sponsored message.
  #4  
Old August 1st, 2008, 05:09 PM
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
I'm looking at this, it looks extremely useful and at a good price
  #5  
Old August 11th, 2008, 11:10 PM
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
Having some problems, getting only a 400 response, I filled out a trouble ticket.
EDIT:
It's working now, I see what took so long. Ir was also mapping my subdomains. Need a little more information on the "Optional Configuration file" that there is no hint on how to format. I read how it is and I looked at the skip file but there is no info on the format of the exclusions file. help?
  #6  
Old August 12th, 2008, 03:37 AM
ABW Ambassador
Join Date: January 18th, 2005
Location: Toronto, Canada
Posts: 1,369
Note that sitemap.pl will display its output only if there is not an actual /sitemap.txt (or /sitemap.xml) file. It appears that your website has an actual .xml file already (DOMAIN.com/sitemap.xml) that was generated by some other sitemap generator software. If you want sitemap.pl to dynamically generate the /sitemap.xml file then you should remove your current /sitemap.xml file.

For the format of the exclusions file, see:
http://www.c3scripts.com/sitemap/doc...#configuration

The exlusions file is a text file that you can create with a plain text editor such as Windows Notepad.

Put one or more exclusions on each line. Seperate each exclusion with a space or a tab or a newline/return. From # to end of line is ignored as comments.

To exclude all files with a certain filename ending (e.g.: .zip), specify that ending (e.g.: .zip); the dot is necessary.

To exclude a directory and everything in it, specify /DIR/ (e.g.: /downloads/). Supports specifying subdirectories too (e.g.: /downloads/plugins/).

To exclude a file, specify the file (e.g.: /downloads/plugins/email.zip).

Wildcards are supported. Character * matches zero or more occurrences of any character. Character ? matches zero or more occurrences of any character EXCEPT for the / directory separator character.

Yours truly,
Cusimano.Com Corporation
per: David Cusimano
Join ABW to remove this sponsored message.
  #7  
Old August 12th, 2008, 09:14 AM
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
I'll delete my sitemap and try again. I did create the sitemap.txt file following the instructions in the installation guide in the sitemap folder and generated a new DOMAIN.COM/sitemap.txt and it still is spidering all subdomains. Don't suppose there is any way to make it follow the rules in robots.txt?

Thank you for the prompt response, with everyone gone to Boston I was expecting a lo0Oong wait.
  #8  
Old August 12th, 2008, 10:07 AM
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
I deleted the existing sitemap and the one generated by the script. I reuploaded the sitemap.txt file with the new rules and then went again to DOMAIN.COM/sitemap.txt. It overwrote my rules with an urllist following only the rules in the script and generated a new sitemap.xml (only in the sitemap folder, not where I had deleted the existing sitemap.xml) but with all the subdomains listed again. I had to put back my old sitemap.xml file and delete those generated by the script. The script does not use the rules I put in and I followed the format of the ?skip file. (without the <html> etc. tags. If it generates a sitemap.txt file whenever I visit the DOMAIN.COM/sitemap.txt page it is going to overwrite my rules every time, isn't it? If it is going to generate a sitemap.txt file shouldn't the rules file have another name?
  #9  
Old August 12th, 2008, 10:22 AM
Getter Done
Join Date: January 18th, 2005
Location: Atlanta
Posts: 603
what about dynamic pages..?
detailspage.asp?xxxx
I have about 30 k of these..
__________________
Marty Ogelvie
Dallas Cowboys fan
New York Yankees fan
Join ABW to remove this sponsored message.
  #10  
Old August 12th, 2008, 11:13 AM
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
If we can get clear information on how to write our own exclusion rules then I'm sure you would have no problem. In the instructions it tells you to make a .txt file called sitemap.txt with a list of your exclusions and put it in the same folder with the .pl script. When you run the script it generates a new file called sitemap.txt that overwrites your rules (and ignores the rules you put in) I'm certain there is a way to do it, I just don't know what it is. It does appear to ignore all dynamically generated pages right off the shelf. Since I don't use .asp I cannot tell you for sure.
This is a terrific asset if we can get some configuration help.
  #11  
Old August 12th, 2008, 01:46 PM
ABW Ambassador
Join Date: January 18th, 2005
Location: Toronto, Canada
Posts: 1,369
Save the exclusions file as cgi-bin/sitemap/sitemap-skip.txt

I assume that subdomans are in subdirectories. So exclude those subdirectories. For example, if mybooks.DOMAIN.com is actually the subdirectory /mybooks/ then add /mybooks/ to your exclusions file. Do that for all domains you want to skip.

sitemap.pl saves the latest sitemaps at cgi-bin/sitemap/sitemap.txt and cgi-bin/sitemap/sitemap.xml. When you (or a spider) access /sitemap.txt (or /sitemap.xml), the .htaccess file causes cgi-bin/sitemap/sitemap.pl to be run *IF* the request file does not exist (that's why I asked you to delete /sitemap.xml in my previous post). The first thing that sitemap.pl does is see if cgi-bin/sitemap/sitemap.txt (and .xml) exists. If they are less than one hour old, they are output. Otherwise, sitemap.pl scans your site's directory for files, saves the output, then sends it out. Because of the cached sitemaps, subsequent requests within one hour will be processed extremely quickly.

The sitemap.pl script only works with actual files stored on the hard drive (on my server it scans at about 500 files per second). It does not work with virtual files since it scans the hard drive, it does not do http accesses (which is much slower). Note that our DySE scripts automatically generate Google Sitemaps so sitemap.pl is not required for DySE sites. A future version of sitemap.pl will allow you to specify URL's that you want included, but sitemap.pl will not spider from there, it will only include those URL's that you specify. That feature would at least enable you to specify the root of your virtual directory, and then it would be up to Google to spider from there (sitemaps don't have to specify every file on your website; Google will spider beyond it).

Yours truly,
Cusimano.Com Corporation
per: David Cusimano
  #12  
Old August 12th, 2008, 02:11 PM
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
Thank you, Thank you! I believe that will resolve the problem. Where I went wrong was this line in the instructions:
Quote:
» 2. Configuration File

An optional configuration file sitemap.txt located in the same directory where sitemap.pl is located can be used to tell sitemap.pl what files to exclude from the sitemap.

The sitemap.pl script has a built-in list of common exclusions (e.g.: skip all .gif files). Thus you only need to create a sitemap.txt only if sitemap.pl is listing files that you do not want to list in the sitemap.
I truly appreciate your timely help, I'm off to try again
Join ABW to remove this sponsored message.
  #13  
Old August 12th, 2008, 02:52 PM
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
Bingo!! It works very well now!
It does not include the AvantLink shops which are dynamic, but I believe if I rewrite the sitemap-skip.txt file I can get thim to be included. It excluded all excluded directories and listed only the files I wanted to have listed

OK! Now let me find that link to go and pay..
Done. Another Happy Customer!
Attached Thumbnails
sitemap.pl Google Sitemap Generator-snoopyhapy.gif  

Last edited by 2busy; August 12th, 2008 at 03:06 PM.
  #14  
Old August 12th, 2008, 03:06 PM
ABW Ambassador
Join Date: January 18th, 2005
Location: Toronto, Canada
Posts: 1,369
The sitemap.pl documentation was updated recently.

It erroneously stated to save the exclusions file to:
cgi-bin/sitemap/sitemap.txt << THIS IS WRONG

The documentation now says to save the file to:
cgi-bin/sitemap/sitemap-skip.txt << THIS IS CORRECT

Yours truly,
Cusimano.Com Corporation
per: David Cusimano
  #15  
Old August 26th, 2008, 03:35 PM
Member
Join Date: June 21st, 2007
Posts: 57
Do you need a google xml site map if your site is already indexing properly?
My site http://dealking.com seems to be fully indexed. Can a google site map help? Can it hurt?
Join ABW to remove this sponsored message.
  #16  
Old August 26th, 2008, 04:24 PM
ABW Ambassador
Join Date: January 18th, 2005
Location: Toronto, Canada
Posts: 1,369
Google.com states: "In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."

Google's Webmaster Help Center:
- About Sitemaps
- Working with Sitemaps

Your website already has a sitemap.xml file (created by some other program) but I can't tell how old it is. Note that my sitemap.pl script scans the website's hard drive to find files (discovers about 500 pages per second), it does not spider the website from the outside. Thus URL's with parameters will not be included (e.g.: /search.php?category=123) and dynamic webpages will not be included either.

If your website consists mainly of real .html (or .php) files located on the hard drive, then sitemap.pl is a good Google Sitemap generator for you. It runs very fast, and you only need to set it up once and you can then forget about it. Google will use the sitemap as a guide to what to spider. Google does not guarantee that everything will be indexed. Also, Google will spider beyond the sitemap, that is, if Google finds a link to a page that does not appear in your sitemap, Google will spider that page, assuming that page is not blocked via your robots.txt file. With a Google sitemap, you're basically telling the Google spider, "I'll save you some time having to spider my website to find all my webpages; here's a list of all my pages and I'll even tell you the relative priority of each page and how often I typically update each page so you know how often you should revisit."

Bottom line is that Google Sitemaps cannot hurt you.

BTW, even if you do not use a Google Sitemap (there's no reason why you shouldn't) then you should at least submit your website to Google Webmaster Tools so you can gain access to important indexing information about your website.

Yours truly,
Cusimano.Com Corporation
per: David Cusimano
  #17  
Old April 23rd, 2010, 06:07 PM
ABW Ambassador
Join Date: January 18th, 2005
Location: Toronto, Canada
Posts: 1,369
The latest beta version of the sitemap.pl Google Sitemap Generator script now generates a log file showing exactly what was included/excluded and why. The sitemap-log.txt is write to the same directory where sitemap.pl is located. The log is not visible via your web browser; so, unless you have ssh access, FTP the log and open it locally.

The latest beta is available in the sitemap.pl download directory.
Information about sitemap.pl is available at the sitemap.pl information page.

(as of this post, the beta is v10.03.09-beta; so that version or higher has the log feature)

Yours truly,
Cusimano.Com Corporation
per: David Cusimano
__________________
Affiliate Tools: Datafeed Merge
 

Tools Search
Search:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Google AdWords VAT Galant PPC Search Engines 0 September 29th, 2007 06:53 AM
Google Trends: boxers v. briefs Carolyn - ShareASale ShareASale 4 May 11th, 2006 04:45 PM
Google Calls us "Thin Affiliates" and Penalizes Us as "Offensive" Nosmada Search Engine Insight 111 August 9th, 2005 09:32 AM
Google AdSense Article in NY Times Akiva Midnight Cafe' 1 August 4th, 2003 05:17 PM


Content Relevant URLs by vBSEO ©2011, Crawlability, Inc.